LSHTM_analysis/scripts/ml/log_rpob_config.txt
2022-06-20 21:55:47 +01:00

19530 lines
973 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
No. of numerical features: 46
No. of categorical features: 7
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
index: 2
ind: 3
Mask count check: True
Original Data
Counter({0: 282, 1: 275}) Data dim: (557, 53)
-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (557, 53)
Test data size: (575, 53)
y_train numbers: Counter({0: 282, 1: 275})
y_train ratio: 1.0254545454545454
y_test_numbers: Counter({0: 545, 1: 30})
y_test ratio: 18.166666666666668
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 282, 1: 282})
(564, 53)
Simple Random UnderSampling
Counter({0: 275, 1: 275})
(550, 53)
Simple Combined Over and UnderSampling
Counter({0: 282, 1: 282})
(564, 53)
SMOTE_NC OverSampling
Counter({0: 282, 1: 282})
(564, 53)
#####################################################################
Running ML analysis: UQ [without AA index but with active site annotations]
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/uq_v1/
Sanity checks:
Total input features: 53
Training data size: (557, 53)
Test data size: (575, 53)
Target feature numbers (training data): Counter({0: 282, 1: 275})
Target features ratio (training data: 1.0254545454545454
Target feature numbers (test data): Counter({0: 545, 1: 30})
Target features ratio (test data): 18.166666666666668
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02604318 0.02941751 0.02129436 0.0241468 0.02635837 0.02950144
0.02385283 0.02729487 0.02373672 0.02400374]
mean value: 0.025564980506896973
key: score_time
value: [0.01137805 0.01106167 0.01085043 0.0108633 0.01163006 0.01123095
0.01119161 0.01098061 0.01095009 0.01113772]
mean value: 0.011127448081970215
key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.82195294 0.71611487 0.85933785
0.75047877 0.78174603 0.71735629 0.8565805 ]
mean value: 0.8149527494116898
key: train_mcc
value: [0.8246123 0.83651026 0.81662709 0.8246123 0.84078809 0.82921429
0.8366859 0.8249619 0.81699263 0.82954689]
mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
0.8280551641651794
key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.91071429 0.85714286 0.92857143
0.875 0.89090909 0.85454545 0.92727273]
mean value: 0.9065584415584416
key: train_accuracy
value: [0.91217565 0.91816367 0.90818363 0.91217565 0.92015968 0.91417166
0.91816367 0.9123506 0.90836653 0.91434263]
mean value: 0.9138253373730626
key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.9122807 0.86206897 0.92592593
0.87719298 0.88888889 0.86206897 0.92307692]
mean value: 0.9068817865833584
key: train_fscore
value: [0.9123506 0.91816367 0.908 0.912 0.92031873 0.91485149
0.91816367 0.9123506 0.90836653 0.91518738]
mean value: 0.9139752661367001
key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.89655172 0.83333333 0.96153846
0.86206897 0.88888889 0.80645161 0.96 ]
mean value: 0.8993978874913247
key: train_precision
value: [0.9015748 0.90909091 0.8972332 0.90118577 0.90588235 0.89534884
0.90551181 0.9015748 0.8976378 0.8957529 ]
mean value: 0.9010793179924724
key: test_recall
value: [1. 0.88888889 0.96428571 0.92857143 0.89285714 0.89285714
0.89285714 0.88888889 0.92592593 0.88888889]
mean value: 0.9164021164021164
key: train_recall
value: [0.9233871 0.92741935 0.91902834 0.92307692 0.93522267 0.93522267
0.93117409 0.9233871 0.91935484 0.93548387]
mean value: 0.9272756954420791
key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.91071429 0.85714286 0.92857143
0.875 0.89087302 0.85582011 0.9265873 ]
mean value: 0.9066616493340631
key: train_roc_auc
value: [0.91228643 0.91825513 0.90833307 0.91232586 0.92036724 0.91446173
0.91834295 0.91248095 0.90849632 0.91459233]
mean value: 0.9139942013981738
key: test_jcc
value: [0.93103448 0.82758621 0.9 0.83870968 0.75757576 0.86206897
0.78125 0.8 0.75757576 0.85714286]
mean value: 0.8312943704886141
key: train_jcc
value: [0.83882784 0.84870849 0.83150183 0.83823529 0.85239852 0.84306569
0.84870849 0.83882784 0.83211679 0.84363636]
mean value: 0.8416027146818327
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.6813972 0.78868866 0.81215572 0.72905326 0.82197165 0.74409461
0.71482921 0.76164103 0.68979573 0.75526428]
mean value: 0.7498891353607178
key: score_time
value: [0.01248741 0.01272368 0.01245189 0.01255608 0.01290011 0.01112986
0.01275086 0.012429 0.01255012 0.01147771]
mean value: 0.012345671653747559
key: test_mcc
value: [0.96481304 0.9284802 0.92857143 0.89802651 0.64285714 0.8660254
0.82195294 0.89153439 0.82337971 0.8565805 ]
mean value: 0.8622221274212197
key: train_mcc
value: [0.91621503 0.94017409 0.93217802 0.93212612 0.95608442 0.92815126
0.92815126 0.94040302 0.916326 0.93624587]
mean value: 0.932605508864504
key: test_accuracy
value: [0.98214286 0.96428571 0.96428571 0.94642857 0.82142857 0.92857143
0.91071429 0.94545455 0.90909091 0.92727273]
mean value: 0.9299675324675325
key: train_accuracy
value: [0.95808383 0.97005988 0.96606786 0.96606786 0.97804391 0.96407186
0.96407186 0.97011952 0.95816733 0.96812749]
mean value: 0.9662881408497745
key: test_fscore
value: [0.98113208 0.96296296 0.96428571 0.94915254 0.82142857 0.92307692
0.9122807 0.94545455 0.9122807 0.92307692]
mean value: 0.9295131661638991
key: train_fscore
value: [0.95740365 0.96957404 0.96537678 0.96551724 0.97768763 0.96341463
0.96341463 0.9694501 0.95757576 0.96774194]
mean value: 0.9657156401043632
key: test_precision
value: [1. 0.96296296 0.96428571 0.90322581 0.82142857 1.
0.89655172 0.92857143 0.86666667 0.96 ]
mean value: 0.9303692874504887
key: train_precision
value: [0.96326531 0.9755102 0.97131148 0.96747967 0.9796748 0.96734694
0.96734694 0.97942387 0.95951417 0.96774194]
mean value: 0.9698615308546767
key: test_recall
value: [0.96296296 0.96296296 0.96428571 1. 0.82142857 0.85714286
0.92857143 0.96296296 0.96296296 0.88888889]
mean value: 0.9312169312169312
key: train_recall
value: [0.9516129 0.96370968 0.95951417 0.96356275 0.9757085 0.95951417
0.95951417 0.95967742 0.95564516 0.96774194]
mean value: 0.961620086195638
key: test_roc_auc
value: [0.98148148 0.9642401 0.96428571 0.94642857 0.82142857 0.92857143
0.91071429 0.9457672 0.91005291 0.9265873 ]
mean value: 0.9299557562488597
key: train_roc_auc
value: [0.95801989 0.96999713 0.96597756 0.96603335 0.97801173 0.96400905
0.96400905 0.96999619 0.95813754 0.96812294]
mean value: 0.9662314429920021
key: test_jcc
value: [0.96296296 0.92857143 0.93103448 0.90322581 0.6969697 0.85714286
0.83870968 0.89655172 0.83870968 0.85714286]
mean value: 0.8711021170976677
key: train_jcc
value: [0.91828794 0.94094488 0.93307087 0.93333333 0.95634921 0.92941176
0.92941176 0.94071146 0.91860465 0.9375 ]
mean value: 0.9337625868482374/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
MCC on Blind test: 0.23
Accuracy on Blind test: 0.65
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.02240419 0.00845432 0.00848174 0.0077436 0.00821161 0.00828028
0.00818753 0.00795102 0.00834084 0.00831008]
mean value: 0.009636521339416504
key: score_time
value: [0.01104975 0.00858045 0.00879431 0.0087378 0.00867748 0.0087285
0.00872374 0.00838089 0.00867343 0.00884008]
mean value: 0.0089186429977417
key: test_mcc
value: [0.74266517 0.48372032 0.77459667 0.71611487 0.40574111 0.61065803
0.55328334 0.68300095 0.74935731 0.74935731]
mean value: 0.6468495081997219
key: train_mcc
value: [0.66487805 0.68935419 0.66458942 0.66570983 0.62725669 0.69324149
0.67986963 0.68418537 0.66184784 0.66877084]
mean value: 0.6699703343840876
key: test_accuracy
value: [0.85714286 0.73214286 0.875 0.85714286 0.69642857 0.80357143
0.76785714 0.83636364 0.87272727 0.87272727]
mean value: 0.8171103896103896
key: train_accuracy
value: [0.82634731 0.83832335 0.82634731 0.8243513 0.79840319 0.84231537
0.83433134 0.83665339 0.8247012 0.82669323]
mean value: 0.8278466970441587
key: test_fscore
value: [0.82608696 0.66666667 0.85714286 0.85185185 0.65306122 0.79245283
0.73469388 0.81632653 0.8627451 0.8627451 ]
mean value: 0.7923772991103286
key: train_fscore
value: [0.80536913 0.81879195 0.80449438 0.79816514 0.75662651 0.82560706
0.81431767 0.81777778 0.80269058 0.80272109]
mean value: 0.8046561286055279
key: test_precision
value: [1. 0.83333333 1. 0.88461538 0.76190476 0.84
0.85714286 0.90909091 0.91666667 0.91666667]
mean value: 0.8919420579420579
key: train_precision
value: [0.90452261 0.91959799 0.9040404 0.92063492 0.93452381 0.90776699
0.91 0.91089109 0.9040404 0.91709845]
mean value: 0.9133116666250641
key: test_recall
value: [0.7037037 0.55555556 0.75 0.82142857 0.57142857 0.75
0.64285714 0.74074074 0.81481481 0.81481481]
mean value: 0.7165343915343916
key: train_recall
value: [0.72580645 0.73790323 0.72469636 0.70445344 0.63562753 0.75708502
0.73684211 0.74193548 0.72177419 0.71370968]
mean value: 0.7199833485699361
key: test_roc_auc
value: [0.85185185 0.72605364 0.875 0.85714286 0.69642857 0.80357143
0.76785714 0.83465608 0.87169312 0.87169312]
mean value: 0.8155947819740923
key: train_roc_auc
value: [0.82535382 0.83733106 0.8249466 0.82269916 0.79616022 0.84114094
0.83298798 0.83553467 0.82348552 0.82535878]
mean value: 0.8264998750879309
key: test_jcc
value: [0.7037037 0.5 0.75 0.74193548 0.48484848 0.65625
0.58064516 0.68965517 0.75862069 0.75862069]
mean value: 0.6624279385437617
key: train_jcc
value: [0.6741573 0.69318182 0.67293233 0.66412214 0.60852713 0.70300752
0.68679245 0.69172932 0.67041199 0.67045455]
mean value: 0.6735316546975922
MCC on Blind test: 0.34
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00915837 0.008641 0.00853086 0.00855017 0.00856733 0.00867844
0.00850987 0.00852513 0.00848937 0.00854254]
mean value: 0.008619308471679688
key: score_time
value: [0.00892973 0.00885773 0.00864935 0.00866771 0.00876236 0.00893545
0.0087409 0.00873089 0.00877738 0.00871181]
mean value: 0.008776330947875976
key: test_mcc
value: [0.89342711 0.74984143 0.85714286 0.71428571 0.67900461 0.78571429
0.64285714 0.71049701 0.75878131 0.74935731]
mean value: 0.7540908782235038
key: train_mcc
value: [0.76073062 0.76464682 0.75244668 0.78078676 0.77655234 0.75249829
0.76042979 0.77325226 0.78086182 0.77758373]
mean value: 0.7679789125420294
key: test_accuracy
value: [0.94642857 0.875 0.92857143 0.85714286 0.83928571 0.89285714
0.82142857 0.85454545 0.87272727 0.87272727]
mean value: 0.8760714285714286
key: train_accuracy
value: [0.88023952 0.88223553 0.8762475 0.89021956 0.88822355 0.8762475
0.88023952 0.88645418 0.89043825 0.88844622]
mean value: 0.8838991340029105
key: test_fscore
value: [0.94545455 0.86792453 0.92857143 0.85714286 0.84210526 0.89285714
0.82142857 0.84615385 0.88135593 0.8627451 ]
mean value: 0.8745739213310778
key: train_fscore
value: [0.88047809 0.88223553 0.87449393 0.89021956 0.8875502 0.875
0.87804878 0.88667992 0.88933602 0.88932806]
mean value: 0.8833370085701109
key: test_precision
value: [0.92857143 0.88461538 0.92857143 0.85714286 0.82758621 0.89285714
0.82142857 0.88 0.8125 0.91666667]
mean value: 0.8749939686750031
key: train_precision
value: [0.87007874 0.87351779 0.87449393 0.87795276 0.88047809 0.87148594
0.88163265 0.8745098 0.8875502 0.87209302]
mean value: 0.8763792922216086
key: test_recall
value: [0.96296296 0.85185185 0.92857143 0.85714286 0.85714286 0.89285714
0.82142857 0.81481481 0.96296296 0.81481481]
mean value: 0.8764550264550264
key: train_recall
value: [0.89112903 0.89112903 0.87449393 0.90283401 0.89473684 0.87854251
0.87449393 0.89919355 0.89112903 0.90725806]
mean value: 0.8904939924252318
key: test_roc_auc
value: [0.94699872 0.87420179 0.92857143 0.85714286 0.83928571 0.89285714
0.82142857 0.85383598 0.87433862 0.87169312]
mean value: 0.8760353950009122
key: train_roc_auc
value: [0.88034712 0.88232341 0.87622334 0.89039338 0.8883133 0.87627913
0.88016035 0.88660465 0.89044641 0.8886684 ]
mean value: 0.8839759495598507
key: test_jcc
value: [0.89655172 0.76666667 0.86666667 0.75 0.72727273 0.80645161
0.6969697 0.73333333 0.78787879 0.75862069]
mean value: 0.7790411905484208
key: train_jcc
value: [0.78647687 0.78928571 0.77697842 0.80215827 0.79783394 0.77777778
0.7826087 0.79642857 0.80072464 0.80071174]
mean value: 0.7910984634590573
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00800991 0.00789165 0.00820088 0.00811696 0.00825524 0.00775909
0.00819016 0.00785351 0.00778246 0.00778031]
mean value: 0.007984018325805664
key: score_time
value: [0.08504152 0.01459813 0.01290846 0.01306796 0.0130856 0.01207209
0.01181221 0.01277232 0.01141 0.01144505]
mean value: 0.01982133388519287
key: test_mcc
value: [0.85696041 0.74984143 0.78772636 0.67900461 0.75047877 0.78571429
0.64450339 0.74569602 0.65330526 0.78353876]
mean value: 0.7436769286015497
key: train_mcc
value: [0.79646836 0.79243629 0.77242951 0.80040802 0.78877235 0.78048897
0.78837632 0.78487523 0.80887676 0.7817104 ]
mean value: 0.7894842220535155
key: test_accuracy
value: [0.92857143 0.875 0.89285714 0.83928571 0.875 0.89285714
0.82142857 0.87272727 0.81818182 0.89090909]
mean value: 0.8706818181818182
key: train_accuracy
value: [0.89820359 0.89620758 0.88622754 0.9001996 0.89421158 0.89021956
0.89421158 0.89243028 0.90438247 0.89043825]
mean value: 0.8946732033940088
key: test_fscore
value: [0.92592593 0.86792453 0.89655172 0.83636364 0.87272727 0.89285714
0.82758621 0.86792453 0.83333333 0.88461538]
mean value: 0.8705809683460952
key: train_fscore
value: [0.89779559 0.89558233 0.88484848 0.89919355 0.89421158 0.88933602
0.89249493 0.89156627 0.904 0.89151874]
mean value: 0.8940547478417012
key: test_precision
value: [0.92592593 0.88461538 0.86666667 0.85185185 0.88888889 0.89285714
0.8 0.88461538 0.75757576 0.92 ]
mean value: 0.8672997002997003
key: train_precision
value: [0.89243028 0.892 0.88306452 0.89558233 0.88188976 0.884
0.89430894 0.888 0.8968254 0.87258687]
mean value: 0.8880688100611991
key: test_recall
value: [0.92592593 0.85185185 0.92857143 0.82142857 0.85714286 0.89285714
0.85714286 0.85185185 0.92592593 0.85185185]
mean value: 0.8764550264550265
key: train_recall
value: [0.90322581 0.89919355 0.88663968 0.90283401 0.90688259 0.89473684
0.89068826 0.89516129 0.91129032 0.91129032]
mean value: 0.9001942666840799
key: test_roc_auc
value: [0.9284802 0.87420179 0.89285714 0.83928571 0.875 0.89285714
0.82142857 0.8723545 0.82010582 0.89021164]
mean value: 0.8706782521437694
key: train_roc_auc
value: [0.89825322 0.89623709 0.88623322 0.9002359 0.89438618 0.89028181
0.89416303 0.89246253 0.90446406 0.89068453]
mean value: 0.8947401572130679
key: test_jcc
value: [0.86206897 0.76666667 0.8125 0.71875 0.77419355 0.80645161
0.70588235 0.76666667 0.71428571 0.79310345]
mean value: 0.772056897564365
key: train_jcc
value: [0.81454545 0.81090909 0.79347826 0.81684982 0.80866426 0.80072464
0.80586081 0.80434783 0.82481752 0.80427046]
mean value: 0.8084468133612275
MCC on Blind test: 0.25
Accuracy on Blind test: 0.72
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01613522 0.01378536 0.01915169 0.01421118 0.01374722 0.01693225
0.014081 0.0139122 0.01385522 0.01448727]
mean value: 0.01502985954284668
key: score_time
value: [0.00889826 0.00894713 0.00886488 0.00883365 0.00874567 0.00876617
0.00901341 0.00938988 0.00874805 0.00884128]
mean value: 0.008904838562011718
key: test_mcc
value: [0.89342711 0.82149863 0.89342711 0.71428571 0.67900461 0.85714286
0.71611487 0.71049701 0.71735629 0.74935731]
mean value: 0.7752111518648109
key: train_mcc
value: [0.77670104 0.78487855 0.77670104 0.79675795 0.80065667 0.78078676
0.79658289 0.79328084 0.79284399 0.78122197]
mean value: 0.7880411697036507
key: test_accuracy
value: [0.94642857 0.91071429 0.94642857 0.85714286 0.83928571 0.92857143
0.85714286 0.85454545 0.85454545 0.87272727]
mean value: 0.8867532467532467
key: train_accuracy
value: [0.88822355 0.89221557 0.88822355 0.89820359 0.9001996 0.89021956
0.89820359 0.89641434 0.89641434 0.89043825]
mean value: 0.8938755954227005
key: test_fscore
value: [0.94545455 0.90566038 0.94736842 0.85714286 0.84210526 0.92857143
0.86206897 0.84615385 0.86206897 0.8627451 ]
mean value: 0.8859339767965393
key: train_fscore
value: [0.88844622 0.89285714 0.888 0.89820359 0.9 0.89021956
0.89779559 0.8968254 0.89558233 0.89065606]
mean value: 0.893858589263252
key: test_precision
value: [0.92857143 0.92307692 0.93103448 0.85714286 0.82758621 0.92857143
0.83333333 0.88 0.80645161 0.91666667]
mean value: 0.8832434939921036
key: train_precision
value: [0.87795276 0.87890625 0.87747036 0.88582677 0.88932806 0.87795276
0.88888889 0.8828125 0.892 0.87843137]
mean value: 0.8829569713874807
key: test_recall
value: [0.96296296 0.88888889 0.96428571 0.85714286 0.85714286 0.92857143
0.89285714 0.81481481 0.92592593 0.81481481]
mean value: 0.8907407407407407
key: train_recall
value: [0.89919355 0.90725806 0.89878543 0.91093117 0.91093117 0.90283401
0.90688259 0.91129032 0.89919355 0.90322581]
mean value: 0.9050525662792216
key: test_roc_auc
value: [0.94699872 0.90996169 0.94642857 0.85714286 0.83928571 0.92857143
0.85714286 0.85383598 0.85582011 0.87169312]
mean value: 0.8866881043605181
key: train_roc_auc
value: [0.88833195 0.89236421 0.88836909 0.89837897 0.90034748 0.89039338
0.89832319 0.89659004 0.89644717 0.89058928]
mean value: 0.8940134761930483
key: test_jcc
value: [0.89655172 0.82758621 0.9 0.75 0.72727273 0.86666667
0.75757576 0.73333333 0.75757576 0.75862069]
mean value: 0.7975182863113898
key: train_jcc
value: [0.79928315 0.80645161 0.79856115 0.81521739 0.81818182 0.80215827
0.81454545 0.81294964 0.81090909 0.80286738]
mean value: 0.8081124970226548
MCC on Blind test: 0.22
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.44565105 1.50711632 1.54303265 1.40061927 1.64806414 1.59813452
1.42913032 1.62204456 1.88645577 1.49996781]
mean value: 1.558021640777588
key: score_time
value: [0.01188302 0.01363969 0.01324034 0.01340175 0.01373863 0.01372957
0.0137701 0.02073598 0.01149464 0.0171504 ]
mean value: 0.014278411865234375
key: test_mcc
value: [0.96490128 0.89342711 0.82195294 0.93094934 0.75047877 0.83484711
0.82195294 0.81878307 0.79069197 0.8565805 ]
mean value: 0.8484565042398484
key: train_mcc
value: [0.96407453 0.96407453 0.97604323 0.96407052 0.97205662 0.97604323
0.96809206 0.96812294 0.96812294 0.96018795]
mean value: 0.9680888542163917
key: test_accuracy
value: [0.98214286 0.94642857 0.91071429 0.96428571 0.875 0.91071429
0.91071429 0.90909091 0.89090909 0.92727273]
mean value: 0.9227272727272727
key: train_accuracy
value: [0.98203593 0.98203593 0.98802395 0.98203593 0.98602794 0.98802395
0.98403194 0.98406375 0.98406375 0.98007968]
mean value: 0.9840422740177016
key: test_fscore
value: [0.98181818 0.94545455 0.9122807 0.96551724 0.87719298 0.90196078
0.9122807 0.90909091 0.89655172 0.92307692]
mean value: 0.9225224695236438
key: train_fscore
value: [0.98181818 0.98181818 0.98785425 0.98174442 0.98580122 0.98785425
0.98387097 0.98387097 0.98387097 0.97991968]
mean value: 0.9838423086546554
key: test_precision
value: [0.96428571 0.92857143 0.89655172 0.93333333 0.86206897 1.
0.89655172 0.89285714 0.83870968 0.96 ]
mean value: 0.9172929710260077
key: train_precision
value: [0.98380567 0.98380567 0.98785425 0.98373984 0.98780488 0.98785425
0.97991968 0.98387097 0.98387097 0.976 ]
mean value: 0.9838526167702565
key: test_recall
value: [1. 0.96296296 0.92857143 1. 0.89285714 0.82142857
0.92857143 0.92592593 0.96296296 0.88888889]
mean value: 0.9312169312169312
key: train_recall
value: [0.97983871 0.97983871 0.98785425 0.97975709 0.98380567 0.98785425
0.98785425 0.98387097 0.98387097 0.98387097]
mean value: 0.983841582865352
key: test_roc_auc
value: [0.98275862 0.94699872 0.91071429 0.96428571 0.875 0.91071429
0.91071429 0.90939153 0.89219577 0.9265873 ]
mean value: 0.9229360518153622
key: train_roc_auc
value: [0.98201422 0.98201422 0.98802161 0.98200453 0.98599732 0.98802161
0.98408461 0.98406147 0.98406147 0.98012446]
mean value: 0.9840405511662667
key: test_jcc
value: [0.96428571 0.89655172 0.83870968 0.93333333 0.78125 0.82142857
0.83870968 0.83333333 0.8125 0.85714286]
mean value: 0.857724488850045
key: train_jcc
value: [0.96428571 0.96428571 0.976 0.96414343 0.972 0.976
0.96825397 0.96825397 0.96825397 0.96062992]
mean value: 0.9682106680887996
MCC on Blind test: 0.24
Accuracy on Blind test: 0.62
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01453757 0.01242089 0.01000428 0.01067281 0.00980949 0.01040387
0.01071548 0.0110333 0.01046252 0.01084757]
mean value: 0.011090779304504394
key: score_time
value: [0.01091075 0.00837231 0.00808406 0.00912213 0.00790167 0.00789714
0.00791621 0.00794125 0.00791454 0.00789595]
mean value: 0.00839560031890869
key: test_mcc
value: [1. 0.85696041 0.78772636 0.92857143 0.82195294 0.89802651
0.79385662 0.89153439 1. 0.74935731]
mean value: 0.8727985977030033
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.92857143 0.89285714 0.96428571 0.91071429 0.94642857
0.89285714 0.94545455 1. 0.87272727]
mean value: 0.9353896103896104
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.92592593 0.89655172 0.96428571 0.9122807 0.94339623
0.9 0.94545455 1. 0.8627451 ]
mean value: 0.9350639936012812
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.92592593 0.86666667 0.96428571 0.89655172 1.
0.84375 0.92857143 1. 0.91666667]
mean value: 0.9342418126254333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.92592593 0.92857143 0.96428571 0.92857143 0.89285714
0.96428571 0.96296296 1. 0.81481481]
mean value: 0.9382275132275132
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9284802 0.89285714 0.96428571 0.91071429 0.94642857
0.89285714 0.9457672 1. 0.87169312]
mean value: 0.9353083378945448
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.86206897 0.8125 0.93103448 0.83870968 0.89285714
0.81818182 0.89655172 1. 0.75862069]
mean value: 0.8810524500527281
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.36
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10190248 0.09926343 0.10202909 0.10004902 0.10008192 0.10025787
0.10022664 0.09921432 0.10032749 0.10042095]
mean value: 0.10037732124328613
key: score_time
value: [0.01691294 0.01694822 0.0180881 0.01682162 0.01716375 0.01716638
0.01710129 0.0170927 0.01716757 0.01707959]
mean value: 0.017154216766357422
key: test_mcc
value: [0.93103448 0.78544061 0.89342711 0.89342711 0.78571429 0.82195294
0.82618439 0.85449735 0.82337971 0.78961518]
mean value: 0.840467318391085
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96428571 0.89285714 0.94642857 0.94642857 0.89285714 0.91071429
0.91071429 0.92727273 0.90909091 0.89090909]
mean value: 0.9191558441558442
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.88888889 0.94736842 0.94736842 0.89285714 0.90909091
0.91525424 0.92592593 0.9122807 0.88 ]
mean value: 0.9183320362196365
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93103448 0.88888889 0.93103448 0.93103448 0.89285714 0.92592593
0.87096774 0.92592593 0.86666667 0.95652174]
mean value: 0.9120857479606331
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.88888889 0.96428571 0.96428571 0.89285714 0.89285714
0.96428571 0.92592593 0.96296296 0.81481481]
mean value: 0.9271164021164021
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96551724 0.89272031 0.94642857 0.94642857 0.89285714 0.91071429
0.91071429 0.92724868 0.91005291 0.88955026]
mean value: 0.9192232256887429
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.8 0.9 0.9 0.80645161 0.83333333
0.84375 0.86206897 0.83870968 0.78571429]
mean value: 0.8501062357646062
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.72
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0078423 0.00767851 0.00772405 0.00781775 0.00766301 0.00781226
0.00765562 0.00782037 0.00784039 0.00778151]
mean value: 0.007763576507568359
key: score_time
value: [0.0079782 0.00796032 0.00801039 0.0080893 0.00807023 0.0080111
0.00799298 0.00796652 0.00790787 0.00800228]
mean value: 0.007998919486999512
key: test_mcc
value: [0.96490128 0.82661701 0.85933785 0.75047877 0.4645821 0.75434227
0.67900461 0.58684513 0.85695439 0.82269299]
mean value: 0.7565756396515464
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98214286 0.91071429 0.92857143 0.875 0.73214286 0.875
0.83928571 0.78181818 0.92727273 0.90909091]
mean value: 0.876103896103896
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.9122807 0.93103448 0.87719298 0.72727273 0.86792453
0.84210526 0.73913043 0.92857143 0.90196078]
mean value: 0.87092915151876
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.86666667 0.9 0.86206897 0.74074074 0.92
0.82758621 0.89473684 0.89655172 0.95833333]
mean value: 0.8830970193683443
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96296296 0.96428571 0.89285714 0.71428571 0.82142857
0.85714286 0.62962963 0.96296296 0.85185185]
mean value: 0.8657407407407407
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98275862 0.91251596 0.92857143 0.875 0.73214286 0.875
0.83928571 0.77910053 0.92791005 0.90806878]
mean value: 0.8760353950009123
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.83870968 0.87096774 0.78125 0.57142857 0.76666667
0.72727273 0.5862069 0.86666667 0.82142857]
mean value: 0.7794883233655481
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.24
Accuracy on Blind test: 0.73
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.32765579 1.30277157 1.36390686 1.35351229 1.34183788 1.42637658
1.35306215 1.29488921 1.30548406 1.29170752]
mean value: 1.3361203908920287
key: score_time
value: [0.0910337 0.09533978 0.09802961 0.09628296 0.09906578 0.09774327
0.09189868 0.09146214 0.09067702 0.0920558 ]
mean value: 0.09435887336730957
key: test_mcc
value: [1. 0.89342711 0.92857143 0.93094934 0.78571429 0.93094934
0.96490128 0.89153439 1. 0.89139151]
mean value: 0.9217438682406724
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.94642857 0.96428571 0.96428571 0.89285714 0.96428571
0.98214286 0.94545455 1. 0.94545455]
mean value: 0.9605194805194806
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94545455 0.96428571 0.96551724 0.89285714 0.96296296
0.98245614 0.94545455 1. 0.94339623]
mean value: 0.9602384519160193
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.92857143 0.96428571 0.93333333 0.89285714 1.
0.96551724 0.92857143 1. 0.96153846]
mean value: 0.957467475053682
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96296296 0.96428571 1. 0.89285714 0.92857143
1. 0.96296296 1. 0.92592593]
mean value: 0.9637566137566138
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.94699872 0.96428571 0.96428571 0.89285714 0.96428571
0.98214286 0.9457672 1. 0.94510582]
mean value: 0.960572888159095
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.89655172 0.93103448 0.93333333 0.80645161 0.92857143
0.96551724 0.89655172 1. 0.89285714]
mean value: 0.9250868690078924
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.19
Accuracy on Blind test: 0.49
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [1.78958488 0.90520978 0.92161393 0.90113878 1.04687214 0.9353826
0.92672181 0.89199233 0.91325641 0.93072248]
mean value: 1.0162495136260987
key: score_time
value: [0.24019027 0.16753531 0.25853181 0.21053815 0.25056458 0.25201178
0.21247077 0.21787858 0.26128078 0.26930881]
mean value: 0.234031081199646
key: test_mcc
value: [1. 0.89342711 0.92857143 0.93094934 0.85714286 0.93094934
0.96490128 0.89153439 1. 0.8565805 ]
mean value: 0.9254056246826363
key: train_mcc
value: [0.94423549 0.94817282 0.94817035 0.94817035 0.95628198 0.94423372
0.94817035 0.95231443 0.94043131 0.94434567]
mean value: 0.9474526465723194
key: test_accuracy
value: [1. 0.94642857 0.96428571 0.96428571 0.92857143 0.96428571
0.98214286 0.94545455 1. 0.92727273]
mean value: 0.9622727272727273
key: train_accuracy
value: [0.97205589 0.9740519 0.9740519 0.9740519 0.97804391 0.97205589
0.9740519 0.97609562 0.97011952 0.97211155]
mean value: 0.9736689966680185
key: test_fscore
value: [1. 0.94545455 0.96428571 0.96551724 0.92857143 0.96296296
0.98245614 0.94545455 1. 0.92307692]
mean value: 0.9617779501536308
key: train_fscore
value: [0.972 0.9739479 0.97384306 0.97384306 0.97795591 0.97188755
0.97384306 0.976 0.97005988 0.972 ]
mean value: 0.9735380413105856
key: test_precision
value: [1. 0.92857143 0.96428571 0.93333333 0.92857143 1.
0.96551724 0.92857143 1. 0.96 ]
mean value: 0.9608850574712644
key: train_precision
value: [0.96428571 0.96812749 0.968 0.968 0.96825397 0.96414343
0.968 0.96825397 0.96047431 0.96428571]
mean value: 0.9661824589714422
key: test_recall
value: [1. 0.96296296 0.96428571 1. 0.92857143 0.92857143
1. 0.96296296 1. 0.88888889]
mean value: 0.9636243386243386
key: train_recall
value: [0.97983871 0.97983871 0.97975709 0.97975709 0.98785425 0.97975709
0.97975709 0.98387097 0.97983871 0.97983871]
mean value: 0.981010839754473
key: test_roc_auc
value: [1. 0.94699872 0.96428571 0.96428571 0.92857143 0.96428571
0.98214286 0.9457672 1. 0.9265873 ]
mean value: 0.9622924648786718
key: train_roc_auc
value: [0.97213279 0.97410908 0.97413051 0.97413051 0.97817909 0.97216201
0.97413051 0.97618745 0.97023432 0.97220282]
mean value: 0.9737599093111167
key: test_jcc
value: [1. 0.89655172 0.93103448 0.93333333 0.86666667 0.92857143
0.96551724 0.89655172 1. 0.85714286]
mean value: 0.9275369458128079
key: train_jcc
value: [0.94552529 0.94921875 0.94901961 0.94901961 0.95686275 0.9453125
0.94901961 0.953125 0.94186047 0.94552529]
mean value: 0.9484488867401317
MCC on Blind test: 0.2
Accuracy on Blind test: 0.5
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01942348 0.00846815 0.0079608 0.00791335 0.00764561 0.00800776
0.00769567 0.00761008 0.00771356 0.00770426]
mean value: 0.009014272689819336
key: score_time
value: [0.01135564 0.00861812 0.0081501 0.00827861 0.00797629 0.00797629
0.00789309 0.00791764 0.00794172 0.0080111 ]
mean value: 0.008411860466003418
key: test_mcc
value: [0.89342711 0.74984143 0.85714286 0.71428571 0.67900461 0.78571429
0.64285714 0.71049701 0.75878131 0.74935731]
mean value: 0.7540908782235038
key: train_mcc
value: [0.76073062 0.76464682 0.75244668 0.78078676 0.77655234 0.75249829
0.76042979 0.77325226 0.78086182 0.77758373]
mean value: 0.7679789125420294
key: test_accuracy
value: [0.94642857 0.875 0.92857143 0.85714286 0.83928571 0.89285714
0.82142857 0.85454545 0.87272727 0.87272727]
mean value: 0.8760714285714286
key: train_accuracy
value: [0.88023952 0.88223553 0.8762475 0.89021956 0.88822355 0.8762475
0.88023952 0.88645418 0.89043825 0.88844622]
mean value: 0.8838991340029105
key: test_fscore
value: [0.94545455 0.86792453 0.92857143 0.85714286 0.84210526 0.89285714
0.82142857 0.84615385 0.88135593 0.8627451 ]
mean value: 0.8745739213310778
key: train_fscore
value: [0.88047809 0.88223553 0.87449393 0.89021956 0.8875502 0.875
0.87804878 0.88667992 0.88933602 0.88932806]
mean value: 0.8833370085701109
key: test_precision
value: [0.92857143 0.88461538 0.92857143 0.85714286 0.82758621 0.89285714
0.82142857 0.88 0.8125 0.91666667]
mean value: 0.8749939686750031
key: train_precision
value: [0.87007874 0.87351779 0.87449393 0.87795276 0.88047809 0.87148594
0.88163265 0.8745098 0.8875502 0.87209302]
mean value: 0.8763792922216086
key: test_recall
value: [0.96296296 0.85185185 0.92857143 0.85714286 0.85714286 0.89285714
0.82142857 0.81481481 0.96296296 0.81481481]
mean value: 0.8764550264550264
key: train_recall
value: [0.89112903 0.89112903 0.87449393 0.90283401 0.89473684 0.87854251
0.87449393 0.89919355 0.89112903 0.90725806]
mean value: 0.8904939924252318
key: test_roc_auc
value: [0.94699872 0.87420179 0.92857143 0.85714286 0.83928571 0.89285714
0.82142857 0.85383598 0.87433862 0.87169312]
mean value: 0.8760353950009122
key: train_roc_auc
value: [0.88034712 0.88232341 0.87622334 0.89039338 0.8883133 0.87627913
0.88016035 0.88660465 0.89044641 0.8886684 ]
mean value: 0.8839759495598507
key: test_jcc
value: [0.89655172 0.76666667 0.86666667 0.75 0.72727273 0.80645161
0.6969697 0.73333333 0.78787879 0.75862069]
mean value: 0.7790411905484208
key: train_jcc
value: [0.78647687 0.78928571 0.77697842 0.80215827 0.79783394 0.77777778
0.7826087 0.79642857 0.80072464 0.80071174]
mean value: 0.7910984634590573
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.13241124 0.04881454 0.04856825 0.05027437 0.04823184 0.0521431
0.05157948 0.04875374 0.05209494 0.05204916]
mean value: 0.058492064476013184
key: score_time
value: [0.01028204 0.01021361 0.00997639 0.00988674 0.00968552 0.00974798
0.00975752 0.00969625 0.01008534 0.00979543]
mean value: 0.009912681579589844
key: test_mcc
value: [1. 0.9284802 0.89342711 0.93094934 0.89342711 0.93094934
0.92857143 0.89153439 1. 0.89139151]
mean value: 0.9288730432045526
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.96428571 0.94642857 0.96428571 0.94642857 0.96428571
0.96428571 0.94545455 1. 0.94545455]
mean value: 0.9640909090909091
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.96296296 0.94736842 0.96551724 0.94736842 0.96296296
0.96428571 0.94545455 1. 0.94339623]
mean value: 0.9639316495565854
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.93103448 0.93333333 0.93103448 1.
0.96428571 0.92857143 1. 0.96153846]
mean value: 0.9612760866209142
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96296296 0.96428571 1. 0.96428571 0.92857143
0.96428571 0.96296296 1. 0.92592593]
mean value: 0.9673280423280424
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9642401 0.94642857 0.96428571 0.94642857 0.96428571
0.96428571 0.9457672 1. 0.94510582]
mean value: 0.9640827403758438
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.92857143 0.9 0.93333333 0.9 0.92857143
0.93103448 0.89655172 1. 0.89285714]
mean value: 0.9310919540229885
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.37
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01652741 0.03145742 0.04161358 0.04130459 0.0417943 0.04165006
0.03727698 0.04370403 0.04243636 0.04148102]
mean value: 0.037924575805664065
key: score_time
value: [0.01045966 0.01935315 0.02035856 0.02224278 0.02057624 0.0211885
0.01432323 0.01109576 0.01095986 0.01956701]
mean value: 0.017012476921081543
key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.82195294 0.67900461 0.89342711
0.67900461 0.78174603 0.71735629 0.82269299]
mean value: 0.8041144809910427
key: train_mcc
value: [0.86087113 0.84902508 0.84841579 0.8325975 0.86886449 0.85702217
0.85676029 0.85318007 0.84497964 0.84964116]
mean value: 0.8521357324796069
key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.91071429 0.83928571 0.94642857
0.83928571 0.89090909 0.85454545 0.90909091]
mean value: 0.9011688311688312
key: train_accuracy
value: [0.93013972 0.9241517 0.9241517 0.91616766 0.93413174 0.92814371
0.92814371 0.92629482 0.92231076 0.92430279]
mean value: 0.9257938306653625
key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.9122807 0.84210526 0.94545455
0.84210526 0.88888889 0.86206897 0.90196078]
mean value: 0.9012178924941413
key: train_fscore
value: [0.93069307 0.92490119 0.92369478 0.916 0.93439364 0.92857143
0.92828685 0.92673267 0.92246521 0.92519685]
mean value: 0.9260935685934734
key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.89655172 0.82758621 0.96296296
0.82758621 0.88888889 0.80645161 0.95833333]
mean value: 0.895350682461361
key: train_precision
value: [0.91439689 0.90697674 0.91633466 0.90513834 0.91796875 0.91050584
0.91372549 0.91050584 0.90980392 0.90384615]
mean value: 0.9109202621383721
key: test_recall
value: [1. 0.88888889 0.96428571 0.92857143 0.85714286 0.92857143
0.85714286 0.88888889 0.92592593 0.85185185]
mean value: 0.9091269841269841
key: train_recall
value: [0.94758065 0.94354839 0.93117409 0.92712551 0.951417 0.94736842
0.94331984 0.94354839 0.93548387 0.94758065]
mean value: 0.9418146793783466
key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.91071429 0.83928571 0.94642857
0.83928571 0.89087302 0.85582011 0.90806878]
mean value: 0.9012383689107828
key: train_roc_auc
value: [0.93031206 0.92434336 0.92424846 0.91631866 0.93436992 0.92840862
0.92835283 0.9264986 0.92246634 0.92457772]
mean value: 0.925989658944721
key: test_jcc
value: [0.93103448 0.82758621 0.9 0.83870968 0.72727273 0.89655172
0.72727273 0.8 0.75757576 0.82142857]
mean value: 0.8227431874762242
key: train_jcc
value: [0.87037037 0.86029412 0.85820896 0.84501845 0.87686567 0.86666667
0.866171 0.86346863 0.85608856 0.86080586]
mean value: 0.8623958291829558
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02321148 0.00787091 0.00767159 0.00766253 0.0082829 0.00821114
0.00813293 0.00844979 0.00815034 0.00824094]
mean value: 0.00958845615386963
key: score_time
value: [0.00829506 0.00817847 0.00787377 0.00799441 0.0085752 0.0085206
0.00833821 0.00848675 0.00863934 0.00852776]
mean value: 0.008342957496643067
key: test_mcc
value: [0.89342711 0.74984143 0.89342711 0.71428571 0.67900461 0.82195294
0.71611487 0.71049701 0.75878131 0.74935731]
mean value: 0.7686689426300658
key: train_mcc
value: [0.76059032 0.77655946 0.76451932 0.78061298 0.78453717 0.7684682
0.78839993 0.78902126 0.77686055 0.77734028]
mean value: 0.7766909479185805
key: test_accuracy
value: [0.94642857 0.875 0.94642857 0.85714286 0.83928571 0.91071429
0.85714286 0.85454545 0.87272727 0.87272727]
mean value: 0.8832142857142857
key: train_accuracy
value: [0.88023952 0.88822355 0.88223553 0.89021956 0.89221557 0.88423154
0.89421158 0.89442231 0.88844622 0.88844622]
mean value: 0.8882891587343242
key: test_fscore
value: [0.94545455 0.86792453 0.94736842 0.85714286 0.84210526 0.9122807
0.86206897 0.84615385 0.88135593 0.8627451 ]
mean value: 0.8824600158777894
key: train_fscore
value: [0.88 0.888 0.88128773 0.88977956 0.89156627 0.88306452
0.89292929 0.89421158 0.88709677 0.88888889]
mean value: 0.8876824599523696
key: test_precision
value: [0.92857143 0.88461538 0.93103448 0.85714286 0.82758621 0.89655172
0.83333333 0.88 0.8125 0.91666667]
mean value: 0.8768002084122773
key: train_precision
value: [0.87301587 0.88095238 0.876 0.88095238 0.88446215 0.87951807
0.89112903 0.88537549 0.88709677 0.875 ]
mean value: 0.8813502159126972
key: test_recall
value: [0.96296296 0.85185185 0.96428571 0.85714286 0.85714286 0.92857143
0.89285714 0.81481481 0.96296296 0.81481481]
mean value: 0.8907407407407407
key: train_recall
value: [0.88709677 0.89516129 0.88663968 0.89878543 0.89878543 0.88663968
0.89473684 0.90322581 0.88709677 0.90322581]
mean value: 0.8941393496147316
key: test_roc_auc
value: [0.94699872 0.87420179 0.94642857 0.85714286 0.83928571 0.91071429
0.85714286 0.85383598 0.87433862 0.87169312]
mean value: 0.8831782521437694
key: train_roc_auc
value: [0.88030728 0.88829211 0.88229622 0.89033759 0.8923061 0.88426472
0.89421881 0.89452629 0.88843028 0.88862078]
mean value: 0.8883600174671026
key: test_jcc
value: [0.89655172 0.76666667 0.9 0.75 0.72727273 0.83870968
0.75757576 0.73333333 0.78787879 0.75862069]
mean value: 0.7916609363939731
key: train_jcc
value: [0.78571429 0.79856115 0.78776978 0.80144404 0.80434783 0.79061372
0.80656934 0.80866426 0.79710145 0.8 ]
mean value: 0.7980785861054747
MCC on Blind test: 0.29
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0116744 0.01263475 0.0133822 0.01279974 0.01338243 0.01445627
0.01424313 0.01156998 0.01224852 0.01223946]
mean value: 0.012863087654113769
key: score_time
value: [0.00871682 0.00998259 0.00996375 0.01038527 0.01055145 0.01046562
0.01043034 0.01041961 0.01059127 0.01041579]
mean value: 0.010192251205444336
key: test_mcc
value: [0.89827421 0.85696041 0.89342711 0.85714286 0.59628479 0.82195294
0.79385662 0.78174603 0.85449735 0.81854376]
mean value: 0.8172686093718741
key: train_mcc
value: [0.83135263 0.909012 0.87714464 0.89219562 0.81343828 0.85235242
0.86715942 0.86343244 0.87040305 0.89653312]
mean value: 0.8673023622935534
key: test_accuracy
value: [0.94642857 0.92857143 0.94642857 0.92857143 0.78571429 0.91071429
0.89285714 0.89090909 0.92727273 0.90909091]
mean value: 0.9066558441558441
key: train_accuracy
value: [0.91217565 0.95409182 0.93812375 0.94610778 0.9001996 0.9261477
0.93213573 0.93027888 0.93426295 0.94820717]
mean value: 0.9321731039912208
key: test_fscore
value: [0.94736842 0.92592593 0.94736842 0.92857143 0.8125 0.90909091
0.9 0.88888889 0.92592593 0.90566038]
mean value: 0.9091300297866832
key: train_fscore
value: [0.91666667 0.95257732 0.93861386 0.94523327 0.9070632 0.92555332
0.93385214 0.92631579 0.93110647 0.948 ]
mean value: 0.9324982031673843
key: test_precision
value: [0.9 0.92592593 0.93103448 0.92857143 0.72222222 0.92592593
0.84375 0.88888889 0.92592593 0.92307692]
mean value: 0.8915321723295861
key: train_precision
value: [0.86428571 0.97468354 0.91860465 0.94715447 0.83848797 0.92
0.8988764 0.969163 0.96536797 0.94047619]
mean value: 0.9237099909738861
key: test_recall
value: [1. 0.92592593 0.96428571 0.92857143 0.92857143 0.89285714
0.96428571 0.88888889 0.92592593 0.88888889]
mean value: 0.9308201058201058
key: train_recall
value: [0.97580645 0.93145161 0.95951417 0.94331984 0.98785425 0.93117409
0.97165992 0.88709677 0.89919355 0.95564516]
mean value: 0.9442715815593574
key: test_roc_auc
value: [0.94827586 0.9284802 0.94642857 0.92857143 0.78571429 0.91071429
0.89285714 0.89087302 0.92724868 0.90873016]
mean value: 0.9067893632548805
key: train_roc_auc
value: [0.91280441 0.9538681 0.9384185 0.94606937 0.90140744 0.92621697
0.93268035 0.92976886 0.93384874 0.94829502]
mean value: 0.9323377764010412
key: test_jcc
value: [0.9 0.86206897 0.9 0.86666667 0.68421053 0.83333333
0.81818182 0.8 0.86206897 0.82758621]
mean value: 0.8354116482428642
key: train_jcc
value: [0.84615385 0.90944882 0.88432836 0.89615385 0.82993197 0.86142322
0.87591241 0.8627451 0.87109375 0.90114068]
mean value: 0.873833200438617
MCC on Blind test: 0.18
Accuracy on Blind test: 0.49
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01433849 0.01301265 0.0144105 0.01522279 0.0132041 0.01305103
0.01268101 0.01373744 0.0126369 0.01393795]
mean value: 0.013623285293579101
key: score_time
value: [0.01049781 0.01047158 0.01053524 0.0104003 0.01044488 0.01042581
0.01039124 0.01047635 0.01046515 0.01041508]
mean value: 0.010452342033386231
key: test_mcc
value: [0.93069263 0.85951469 0.82618439 0.89802651 0.75047877 0.78571429
0.73127242 0.85695439 0.92962225 0.8565805 ]
mean value: 0.8425040848772626
key: train_mcc
value: [0.87181962 0.83135263 0.85503558 0.88967789 0.91283821 0.88589338
0.86743952 0.82906495 0.86468284 0.92034415]
mean value: 0.8728148763208086
key: test_accuracy
value: [0.96428571 0.92857143 0.91071429 0.94642857 0.875 0.89285714
0.85714286 0.92727273 0.96363636 0.92727273]
mean value: 0.9193181818181818
key: train_accuracy
value: [0.93413174 0.91217565 0.9241517 0.94411178 0.95608782 0.94211577
0.93213573 0.91035857 0.93027888 0.96015936]
mean value: 0.9345706992389723
key: test_fscore
value: [0.96153846 0.92857143 0.90566038 0.94915254 0.87719298 0.89285714
0.84 0.92857143 0.96153846 0.92307692]
mean value: 0.9168159748341358
key: train_fscore
value: [0.93023256 0.91666667 0.91774892 0.94488189 0.95454545 0.94302554
0.9279661 0.91525424 0.92569002 0.95983936]
mean value: 0.9335850744783595
key: test_precision
value: [1. 0.89655172 0.96 0.90322581 0.86206897 0.89285714
0.95454545 0.89655172 1. 0.96 ]
mean value: 0.9325800817647314
key: train_precision
value: [0.97777778 0.86428571 0.98604651 0.91954023 0.97468354 0.91603053
0.97333333 0.85865724 0.97757848 0.956 ]
mean value: 0.940393336471731
key: test_recall
value: [0.92592593 0.96296296 0.85714286 1. 0.89285714 0.89285714
0.75 0.96296296 0.92592593 0.88888889]
mean value: 0.905952380952381
key: train_recall
value: [0.88709677 0.97580645 0.8582996 0.97165992 0.93522267 0.97165992
0.88663968 0.97983871 0.87903226 0.96370968]
mean value: 0.930896565234426
key: test_roc_auc
value: [0.96296296 0.92975734 0.91071429 0.94642857 0.875 0.89285714
0.85714286 0.92791005 0.96296296 0.9265873 ]
mean value: 0.9192323481116584
key: train_roc_auc
value: [0.93366696 0.91280441 0.92324429 0.94449138 0.95580031 0.94252287
0.93150881 0.9111792 0.92967361 0.9602013 ]
mean value: 0.9345093140199082
key: test_jcc
value: [0.92592593 0.86666667 0.82758621 0.90322581 0.78125 0.80645161
0.72413793 0.86666667 0.92592593 0.85714286]
mean value: 0.8484979599613915
key: train_jcc
value: [0.86956522 0.84615385 0.848 0.89552239 0.91304348 0.89219331
0.86561265 0.84375 0.86166008 0.92277992]
mean value: 0.8758280888468557
MCC on Blind test: 0.23
Accuracy on Blind test: 0.67
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.10799265 0.09333324 0.09343123 0.09375167 0.09355068 0.09351802
0.09371734 0.09361362 0.09355211 0.09383702]
mean value: 0.09502975940704346
key: score_time
value: [0.01410651 0.01418447 0.01437783 0.01411939 0.01418042 0.01420903
0.01414371 0.01427364 0.01410794 0.01541471]
mean value: 0.014311766624450684
key: test_mcc
value: [0.96481304 0.89315584 0.96490128 0.89802651 0.85933785 0.93094934
0.96490128 0.89153439 1. 0.92724868]
mean value: 0.9294868200199901
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98214286 0.94642857 0.98214286 0.94642857 0.92857143 0.96428571
0.98214286 0.94545455 1. 0.96363636]
mean value: 0.9641233766233765
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98113208 0.94339623 0.98245614 0.94915254 0.93103448 0.96296296
0.98245614 0.94545455 1. 0.96296296]
mean value: 0.964100807910052
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96153846 0.96551724 0.90322581 0.9 1.
0.96551724 0.92857143 1. 0.96296296]
mean value: 0.9587333142283087
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96296296 0.92592593 1. 1. 0.96428571 0.92857143
1. 0.96296296 1. 0.96296296]
mean value: 0.9707671957671957
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98148148 0.94572158 0.98214286 0.94642857 0.92857143 0.96428571
0.98214286 0.9457672 1. 0.96362434]
mean value: 0.9640166028097062
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96296296 0.89285714 0.96551724 0.90322581 0.87096774 0.92857143
0.96551724 0.89655172 1. 0.92857143]
mean value: 0.9314742718246611
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.39
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03909659 0.04021478 0.03637838 0.04280972 0.04759765 0.04407358
0.04242229 0.04342175 0.03089213 0.04606533]
mean value: 0.04129722118377686
key: score_time
value: [0.02080131 0.02186036 0.0172112 0.03137207 0.02341676 0.0170188
0.03378963 0.01608229 0.01641321 0.03630662]
mean value: 0.02342722415924072
key: test_mcc
value: [1. 0.85696041 0.92857143 0.93094934 0.89342711 0.96490128
0.96490128 0.89153439 1. 0.89139151]
mean value: 0.9322636750479738
key: train_mcc
value: [0.98803016 0.98403035 0.99204516 0.99204516 0.99204692 0.98803016
0.99201441 0.99602309 0.98409121 0.99203073]
mean value: 0.9900387331545668
key: test_accuracy
value: [1. 0.92857143 0.96428571 0.96428571 0.94642857 0.98214286
0.98214286 0.94545455 1. 0.94545455]
mean value: 0.9658766233766234
key: train_accuracy
value: [0.99401198 0.99201597 0.99600798 0.99600798 0.99600798 0.99401198
0.99600798 0.99800797 0.99203187 0.99601594]
mean value: 0.9950127633179855
key: test_fscore
value: [1. 0.92592593 0.96428571 0.96551724 0.94736842 0.98181818
0.98245614 0.94545455 1. 0.94339623]
mean value: 0.9656222396682281
key: train_fscore
value: [0.99393939 0.99193548 0.99593496 0.99593496 0.99596774 0.99393939
0.99595142 0.9979798 0.99190283 0.99596774]
mean value: 0.9949453723311854
key: test_precision
value: [1. 0.92592593 0.96428571 0.93333333 0.93103448 1.
0.96551724 0.92857143 1. 0.96153846]
mean value: 0.9610206587792794
key: train_precision
value: [0.99595142 0.99193548 1. 1. 0.99196787 0.99193548
0.99595142 1. 0.99593496 0.99596774]
mean value: 0.9959644374521054
key: test_recall
value: [1. 0.92592593 0.96428571 1. 0.96428571 0.96428571
1. 0.96296296 1. 0.92592593]
mean value: 0.9707671957671957
key: train_recall
value: [0.99193548 0.99193548 0.99190283 0.99190283 1. 0.99595142
0.99595142 0.99596774 0.98790323 0.99596774]
mean value: 0.9939418179443646
key: test_roc_auc
value: [1. 0.9284802 0.96428571 0.96428571 0.94642857 0.98214286
0.98214286 0.9457672 1. 0.94510582]
mean value: 0.9658638934501004
key: train_roc_auc
value: [0.99399146 0.99201517 0.99595142 0.99595142 0.99606299 0.9940387
0.9960072 0.99798387 0.99198311 0.99601537]
mean value: 0.9950000708407828
key: test_jcc
value: [1. 0.86206897 0.93103448 0.93333333 0.9 0.96428571
0.96551724 0.89655172 1. 0.89285714]
mean value: 0.9345648604269294
key: train_jcc
value: [0.98795181 0.984 0.99190283 0.99190283 0.99196787 0.98795181
0.99193548 0.99596774 0.98393574 0.99196787]
mean value: 0.9899483994224252
MCC on Blind test: 0.14
Accuracy on Blind test: 0.37
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.17011619 0.17630982 0.1871922 0.19212961 0.17157412 0.15598917
0.15701985 0.17155218 0.17326117 0.15419507]
mean value: 0.17093393802642823
key: score_time
value: [0.02660775 0.02126074 0.02073812 0.01927352 0.02623224 0.02367377
0.0206089 0.02606893 0.02499342 0.01983643]
mean value: 0.02292938232421875
key: test_mcc
value: [0.89342711 0.74984143 0.89342711 0.71428571 0.71428571 0.78571429
0.68250015 0.78174603 0.72754449 0.81854376]
mean value: 0.7761315807091642
key: train_mcc
value: [0.83651026 0.85265474 0.84449262 0.84078809 0.85265708 0.84078809
0.84078809 0.84907279 0.86501334 0.85318007]
mean value: 0.8475945177079469
key: test_accuracy
value: [0.94642857 0.875 0.94642857 0.85714286 0.85714286 0.89285714
0.83928571 0.89090909 0.85454545 0.90909091]
mean value: 0.8868831168831168
key: train_accuracy
value: [0.91816367 0.9261477 0.92215569 0.92015968 0.9261477 0.92015968
0.92015968 0.92430279 0.93227092 0.92629482]
mean value: 0.9235962338271664
key: test_fscore
value: [0.94545455 0.86792453 0.94736842 0.85714286 0.85714286 0.89285714
0.84745763 0.88888889 0.86666667 0.90566038]
mean value: 0.8876563911984611
key: train_fscore
value: [0.91816367 0.92644135 0.92184369 0.92031873 0.9261477 0.92031873
0.92031873 0.92460317 0.93253968 0.92673267]
mean value: 0.9237428122217914
key: test_precision
value: [0.92857143 0.88461538 0.93103448 0.85714286 0.85714286 0.89285714
0.80645161 0.88888889 0.78787879 0.92307692]
mean value: 0.8757660365836116
key: train_precision
value: [0.90909091 0.91372549 0.91269841 0.90588235 0.91338583 0.90588235
0.90588235 0.91015625 0.91796875 0.91050584]
mean value: 0.9105178534156458
key: test_recall
value: [0.96296296 0.85185185 0.96428571 0.85714286 0.85714286 0.89285714
0.89285714 0.88888889 0.96296296 0.88888889]
mean value: 0.901984126984127
key: train_recall
value: [0.92741935 0.93951613 0.93117409 0.93522267 0.93927126 0.93522267
0.93522267 0.93951613 0.94758065 0.94354839]
mean value: 0.9373694005485177
key: test_roc_auc
value: [0.94699872 0.87420179 0.94642857 0.85714286 0.85714286 0.89285714
0.83928571 0.89087302 0.85648148 0.90873016]
mean value: 0.8870142309797482
key: train_roc_auc
value: [0.91825513 0.9262798 0.92227996 0.92036724 0.92632854 0.92036724
0.92036724 0.92448247 0.93245174 0.9264986 ]
mean value: 0.9237677975946037
key: test_jcc
value: [0.89655172 0.76666667 0.9 0.75 0.75 0.80645161
0.73529412 0.8 0.76470588 0.82758621]
mean value: 0.7997256210604375
key: train_jcc
value: [0.84870849 0.86296296 0.85501859 0.85239852 0.86245353 0.85239852
0.85239852 0.8597786 0.87360595 0.86346863]
mean value: 0.8583192321390376
MCC on Blind test: 0.29
Accuracy on Blind test: 0.73
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.25510955 0.24359798 0.24252176 0.24263096 0.24308658 0.2433722
0.24473262 0.24413776 0.24478555 0.24284172]
mean value: 0.2446816682815552
key: score_time
value: [0.00862026 0.00837231 0.00834084 0.00830126 0.00861955 0.00825286
0.00839138 0.00829506 0.00852728 0.00854683]
mean value: 0.008426761627197266
key: test_mcc
value: [1. 0.9284802 0.92857143 0.93094934 0.85933785 0.96490128
0.96490128 0.89153439 1. 0.8565805 ]
mean value: 0.9325256275611022
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.96428571 0.96428571 0.96428571 0.92857143 0.98214286
0.98214286 0.94545455 1. 0.92727273]
mean value: 0.9658441558441558
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.96296296 0.96428571 0.96551724 0.93103448 0.98181818
0.98245614 0.94545455 1. 0.92307692]
mean value: 0.9656606192087136
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.96428571 0.93333333 0.9 1.
0.96551724 0.92857143 1. 0.96 ]
mean value: 0.961467068053275
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.96296296 0.96428571 1. 0.96428571 0.96428571
1. 0.96296296 1. 0.88888889]
mean value: 0.9707671957671957
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9642401 0.96428571 0.96428571 0.92857143 0.98214286
0.98214286 0.9457672 1. 0.9265873 ]
mean value: 0.9658023170954205
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.92857143 0.93103448 0.93333333 0.87096774 0.96428571
0.96551724 0.89655172 1. 0.85714286]
mean value: 0.9347404523544679
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.3
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01189399 0.01387286 0.01450205 0.01408744 0.01389623 0.0166738
0.01422548 0.01454496 0.01396155 0.01419568]
mean value: 0.014185404777526856
key: score_time
value: [0.01086521 0.01088333 0.01085353 0.01082253 0.01088142 0.01098228
0.01087499 0.01154423 0.01157999 0.01079154]
mean value: 0.011007905006408691
key: test_mcc
value: [0.9284802 0.54871911 0.78772636 0.71611487 0.47187011 0.60753044
0.68250015 0.60268595 0.81878307 0.67602163]
mean value: 0.6840431895998464
key: train_mcc
value: [0.80440606 0.81032473 0.79940894 0.79646944 0.70336606 0.78901365
0.78773489 0.80486309 0.79845601 0.7610531 ]
mean value: 0.7855095967948068
key: test_accuracy
value: [0.96428571 0.76785714 0.89285714 0.85714286 0.73214286 0.80357143
0.83928571 0.8 0.90909091 0.83636364]
mean value: 0.8402597402597403
key: train_accuracy
value: [0.90219561 0.90419162 0.89620758 0.89820359 0.83433134 0.89221557
0.89221557 0.90039841 0.89840637 0.87848606]
mean value: 0.88968517148969
key: test_fscore
value: [0.96296296 0.72340426 0.88888889 0.85185185 0.70588235 0.80701754
0.83018868 0.78431373 0.90909091 0.82352941]
mean value: 0.8287130581414772
key: train_fscore
value: [0.90060852 0.89958159 0.88695652 0.89570552 0.8 0.88412017
0.88510638 0.89361702 0.89352818 0.86993603]
mean value: 0.8809159946199812
key: test_precision
value: [0.96296296 0.85 0.92307692 0.88461538 0.7826087 0.79310345
0.88 0.83333333 0.89285714 0.875 ]
mean value: 0.8677557890773783
key: train_precision
value: [0.90612245 0.93478261 0.95774648 0.90495868 0.98809524 0.94063927
0.93273543 0.94594595 0.92640693 0.92307692]
mean value: 0.9360509943174828
key: test_recall
value: [0.96296296 0.62962963 0.85714286 0.82142857 0.64285714 0.82142857
0.78571429 0.74074074 0.92592593 0.77777778]
mean value: 0.7965608465608466
key: train_recall
value: [0.89516129 0.86693548 0.82591093 0.88663968 0.67206478 0.8340081
0.84210526 0.84677419 0.86290323 0.82258065]
mean value: 0.8355083583648949
key: test_roc_auc
value: [0.9642401 0.76309068 0.89285714 0.85714286 0.73214286 0.80357143
0.83928571 0.7989418 0.90939153 0.83531746]
mean value: 0.8395981572705711
key: train_roc_auc
value: [0.9021261 0.90382347 0.89523893 0.89804425 0.83209538 0.8914135
0.89152507 0.89976505 0.89798705 0.87782576]
mean value: 0.8889844552398375
key: test_jcc
value: [0.92857143 0.56666667 0.8 0.74193548 0.54545455 0.67647059
0.70967742 0.64516129 0.83333333 0.7 ]
mean value: 0.7147270755809655
key: train_jcc
value: [0.81918819 0.81749049 0.796875 0.81111111 0.66666667 0.79230769
0.79389313 0.80769231 0.80754717 0.76981132]
mean value: 0.7882583084293304
MCC on Blind test: 0.32
Accuracy on Blind test: 0.72
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01984334 0.02952814 0.0294714 0.0295279 0.02948236 0.02966619
0.02946544 0.02954245 0.02954078 0.02948689]
mean value: 0.02855548858642578
key: score_time
value: [0.01642299 0.01944399 0.02070069 0.01963282 0.01078868 0.02068901
0.01969433 0.0106039 0.02023768 0.02003217]
mean value: 0.017824625968933104
key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.78772636 0.67900461 0.85933785
0.71611487 0.78174603 0.71735629 0.81854376]
mean value: 0.8005790004186416
key: train_mcc
value: [0.82071187 0.83279667 0.82071472 0.80065667 0.82921429 0.83720268
0.82507217 0.81310081 0.82516195 0.81719167]
mean value: 0.822182349120051
key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.89285714 0.83928571 0.92857143
0.85714286 0.89090909 0.85454545 0.90909091]
mean value: 0.8993831168831169
key: train_accuracy
value: [0.91017964 0.91616766 0.91017964 0.9001996 0.91417166 0.91816367
0.91217565 0.9063745 0.9123506 0.90836653]
mean value: 0.9108329158416235
key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.89655172 0.84210526 0.92592593
0.86206897 0.88888889 0.86206897 0.90566038]
mean value: 0.900058462320045
key: train_fscore
value: [0.91053678 0.91666667 0.91017964 0.9 0.91485149 0.91881188
0.91269841 0.90656064 0.91269841 0.90873016]
mean value: 0.9111734073355806
key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.86666667 0.82758621 0.96153846
0.83333333 0.88888889 0.80645161 0.92307692]
mean value: 0.8892687981898215
key: train_precision
value: [0.89803922 0.90234375 0.8976378 0.88932806 0.89534884 0.89922481
0.89494163 0.89411765 0.8984375 0.89453125]
mean value: 0.8963950498913893
key: test_recall
value: [1. 0.88888889 0.96428571 0.92857143 0.85714286 0.89285714
0.89285714 0.88888889 0.92592593 0.88888889]
mean value: 0.9128306878306878
key: train_recall
value: [0.9233871 0.93145161 0.92307692 0.91093117 0.93522267 0.93927126
0.93117409 0.91935484 0.92741935 0.9233871 ]
mean value: 0.9264676113360324
key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.89285714 0.83928571 0.92857143
0.85714286 0.89087302 0.85582011 0.90873016]
mean value: 0.899518792191206
key: train_roc_auc
value: [0.91031015 0.91631869 0.91035736 0.90034748 0.91446173 0.91845453
0.91243744 0.90652781 0.91252858 0.90854394]
mean value: 0.9110287700326485
key: test_jcc
value: [0.93103448 0.82758621 0.9 0.8125 0.72727273 0.86206897
0.75757576 0.8 0.75757576 0.82758621]
mean value: 0.8203200104493208
key: train_jcc
value: [0.83576642 0.84615385 0.83516484 0.81818182 0.84306569 0.84981685
0.83941606 0.82909091 0.83941606 0.83272727]
mean value: 0.8368799764712174
MCC on Blind test: 0.25
Accuracy on Blind test: 0.71
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.19355822 0.24611878 0.21725583 0.18385339 0.20384955 0.20188379
0.19162393 0.19133949 0.19330359 0.22055078]
mean value: 0.2043337345123291
key: score_time
value: [0.02099872 0.01083326 0.02049541 0.02122831 0.02145958 0.01939106
0.01959395 0.02021599 0.01877356 0.0112946 ]
mean value: 0.018428444862365723
key: test_mcc
value: [0.93103448 0.82149863 0.89342711 0.78772636 0.67900461 0.89342711
0.67900461 0.78174603 0.71735629 0.8565805 ]
mean value: 0.8040805738070792
key: train_mcc
value: [0.84902508 0.84902508 0.84856792 0.80065667 0.86474639 0.86116786
0.85289102 0.8493299 0.83338631 0.84549238]
mean value: 0.8454288618434195
key: test_accuracy
value: [0.96428571 0.91071429 0.94642857 0.89285714 0.83928571 0.94642857
0.83928571 0.89090909 0.85454545 0.92727273]
mean value: 0.9012012987012987
key: train_accuracy
value: [0.9241517 0.9241517 0.9241517 0.9001996 0.93213573 0.93013972
0.9261477 0.92430279 0.91633466 0.92231076]
mean value: 0.9224026051482692
key: test_fscore
value: [0.96428571 0.90566038 0.94736842 0.89655172 0.84210526 0.94545455
0.84210526 0.88888889 0.86206897 0.92307692]
mean value: 0.9017566086088156
key: train_fscore
value: [0.92490119 0.92490119 0.924 0.9 0.93227092 0.93069307
0.92644135 0.92490119 0.91699605 0.92307692]
mean value: 0.9228181865350266
key: test_precision
value: [0.93103448 0.92307692 0.93103448 0.86666667 0.82758621 0.96296296
0.82758621 0.88888889 0.80645161 0.96 ]
mean value: 0.8925288433809012
key: train_precision
value: [0.90697674 0.90697674 0.91304348 0.88932806 0.91764706 0.91085271
0.91015625 0.90697674 0.89922481 0.9034749 ]
mean value: 0.9064657505738394
key: test_recall
value: [1. 0.88888889 0.96428571 0.92857143 0.85714286 0.92857143
0.85714286 0.88888889 0.92592593 0.88888889]
mean value: 0.9128306878306878
key: train_recall
value: [0.94354839 0.94354839 0.93522267 0.91093117 0.94736842 0.951417
0.94331984 0.94354839 0.93548387 0.94354839]
mean value: 0.939793652866658
key: test_roc_auc
value: [0.96551724 0.90996169 0.94642857 0.89285714 0.83928571 0.94642857
0.83928571 0.89087302 0.85582011 0.9265873 ]
mean value: 0.9013045064769203
key: train_roc_auc
value: [0.92434336 0.92434336 0.92430425 0.90034748 0.93234563 0.93043291
0.92638433 0.9245301 0.91656083 0.9225616 ]
mean value: 0.9226153848348726
key: test_jcc
value: [0.93103448 0.82758621 0.9 0.8125 0.72727273 0.89655172
0.72727273 0.8 0.75757576 0.85714286]
mean value: 0.8236936483057172
key: train_jcc
value: [0.86029412 0.86029412 0.85873606 0.81818182 0.87313433 0.87037037
0.86296296 0.86029412 0.84671533 0.85714286]
mean value: 0.8568126077904101
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02480316 0.04043341 0.03790212 0.02258253 0.02496028 0.02353382
0.02346492 0.02378702 0.02321482 0.02300453]
mean value: 0.02676866054534912
key: score_time
value: [0.01101851 0.01310444 0.01055646 0.01048899 0.01055861 0.01072121
0.01049709 0.01050425 0.01051044 0.01050949]
mean value: 0.010846948623657227
key: test_mcc
value: [0.8953202 0.8953202 0.82512315 0.79110556 0.71611487 0.89342711
0.75434227 0.75047877 0.68250015 0.82195294]
mean value: 0.8025685230193058
key: train_mcc
value: [0.82263766 0.83068165 0.82666897 0.83070006 0.83890131 0.81930411
0.83123063 0.8387452 0.83529327 0.81527029]
mean value: 0.8289433160428895
key: test_accuracy
value: [0.94736842 0.94736842 0.9122807 0.89473684 0.85714286 0.94642857
0.875 0.875 0.83928571 0.91071429]
mean value: 0.900532581453634
key: train_accuracy
value: [0.9112426 0.91518738 0.91321499 0.91518738 0.91929134 0.90944882
0.91535433 0.91929134 0.91732283 0.90748031]
mean value: 0.9143021323517992
key: test_fscore
value: [0.94736842 0.94736842 0.9122807 0.9 0.86206897 0.94545455
0.88135593 0.87272727 0.84745763 0.90909091]
mean value: 0.9025172795971652
key: train_fscore
value: [0.9122807 0.91650485 0.9140625 0.91617934 0.92038835 0.91085271
0.91682785 0.92007797 0.91891892 0.90873786]
mean value: 0.9154831064752351
key: test_precision
value: [0.93103448 0.93103448 0.92857143 0.87096774 0.83333333 0.96296296
0.83870968 0.88888889 0.80645161 0.92592593]
mean value: 0.8917880537457845
key: train_precision
value: [0.9034749 0.90421456 0.9034749 0.90384615 0.90804598 0.89694656
0.90114068 0.91119691 0.90151515 0.89655172]
mean value: 0.9030407533340564
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.92857143
0.92857143 0.85714286 0.89285714 0.89285714]
mean value: 0.9149014778325123
key: train_recall
value: [0.92125984 0.92913386 0.92490119 0.92885375 0.93307087 0.92519685
0.93307087 0.92913386 0.93700787 0.92125984]
mean value: 0.9282888798979179
key: test_roc_auc
value: [0.9476601 0.9476601 0.91256158 0.89408867 0.85714286 0.94642857
0.875 0.875 0.83928571 0.91071429]
mean value: 0.9005541871921182
key: train_roc_auc
value: [0.91122281 0.91515981 0.91323799 0.91521428 0.91929134 0.90944882
0.91535433 0.91929134 0.91732283 0.90748031]
mean value: 0.914302387102798
key: test_jcc
value: [0.9 0.9 0.83870968 0.81818182 0.75757576 0.89655172
0.78787879 0.77419355 0.73529412 0.83333333]
mean value: 0.8241718764561139
key: train_jcc
value: [0.83870968 0.84587814 0.84172662 0.84532374 0.85251799 0.83629893
0.84642857 0.85198556 0.85 0.83274021]
mean value: 0.8441609435846644
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.66862154 0.89965343 0.87873912 0.72013283 0.79384041 0.85204673
0.71222496 0.79153037 0.77842331 0.70308733]
mean value: 0.779830002784729
key: score_time
value: [0.01159906 0.02059031 0.01229548 0.01344728 0.01333761 0.01349497
0.01231074 0.0122211 0.0130713 0.01268458]
mean value: 0.013505244255065918
key: test_mcc
value: [0.93202124 0.92980296 0.92980296 0.85960591 0.78772636 1.
0.85933785 0.85714286 0.78772636 0.78772636]
mean value: 0.8730892854406824
key: train_mcc
value: [0.93294638 0.93691352 0.94480151 0.93691156 0.93703692 0.93703692
0.92520402 0.9332517 0.92520402 0.94095217]
mean value: 0.9350258732625361
key: test_accuracy
value: [0.96491228 0.96491228 0.96491228 0.92982456 0.89285714 1.
0.92857143 0.92857143 0.89285714 0.89285714]
mean value: 0.9360275689223058
key: train_accuracy
value: [0.96646943 0.96844181 0.97238659 0.96844181 0.96850394 0.96850394
0.96259843 0.96653543 0.96259843 0.97047244]
mean value: 0.967495224339561
key: test_fscore
value: [0.96296296 0.96428571 0.96551724 0.93103448 0.89655172 1.
0.93103448 0.92857143 0.89655172 0.88888889]
mean value: 0.9365398649881409
key: train_fscore
value: [0.96646943 0.96837945 0.97222222 0.96825397 0.96837945 0.96837945
0.96267191 0.96620278 0.96267191 0.9704142 ]
mean value: 0.9674044754283551
key: test_precision
value: [1. 0.96428571 0.96551724 0.93103448 0.86666667 1.
0.9 0.92857143 0.86666667 0.92307692]
mean value: 0.934581912340533
key: train_precision
value: [0.96837945 0.97222222 0.97609562 0.97211155 0.97222222 0.97222222
0.96078431 0.97590361 0.96078431 0.97233202]
mean value: 0.9703057542340813
key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 1.
0.96428571 0.92857143 0.92857143 0.85714286]
mean value: 0.9396551724137931
key: train_recall
value: [0.96456693 0.96456693 0.96837945 0.96442688 0.96456693 0.96456693
0.96456693 0.95669291 0.96456693 0.96850394]
mean value: 0.9645404749307522
key: test_roc_auc
value: [0.96428571 0.96490148 0.96490148 0.92980296 0.89285714 1.
0.92857143 0.92857143 0.89285714 0.89285714]
mean value: 0.935960591133005
key: train_roc_auc
value: [0.96647319 0.96844947 0.9723787 0.96843391 0.96850394 0.96850394
0.96259843 0.96653543 0.96259843 0.97047244]
mean value: 0.9674947869658586
key: test_jcc
value: [0.92857143 0.93103448 0.93333333 0.87096774 0.8125 1.
0.87096774 0.86666667 0.8125 0.8 ]
mean value: 0.8826541395201017
key: train_jcc
value: [0.9351145 0.93869732 0.94594595 0.93846154 0.93869732 0.93869732
0.9280303 0.93461538 0.9280303 0.94252874]
mean value: 0.9368818668555441
MCC on Blind test: 0.23
Accuracy on Blind test: 0.65
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01145101 0.01084852 0.0087235 0.0084331 0.0084486 0.00842762
0.00777602 0.00863409 0.00846457 0.00798273]
mean value: 0.008918976783752442
key: score_time
value: [0.01097488 0.00906825 0.00876808 0.00882697 0.0086298 0.00885534
0.00874305 0.00842094 0.00866842 0.00866818]
mean value: 0.008962392807006836
key: test_mcc
value: [0.77728159 0.68736396 0.77903565 0.56277738 0.43876345 0.49030429
0.75434227 0.65814518 0.73127242 0.65814518]
mean value: 0.6537431378840208
key: train_mcc
value: [0.65218808 0.64992518 0.66460838 0.66501403 0.62068788 0.66768511
0.6527166 0.71796573 0.66658604 0.66539291]
mean value: 0.66227699222193
key: test_accuracy
value: [0.87719298 0.84210526 0.87719298 0.77192982 0.71428571 0.73214286
0.875 0.82142857 0.85714286 0.82142857]
mean value: 0.818984962406015
key: train_accuracy
value: [0.81854043 0.81656805 0.82445759 0.82248521 0.79330709 0.82677165
0.81889764 0.85629921 0.82677165 0.82480315]
mean value: 0.8228901675752069
key: test_fscore
value: [0.85714286 0.83018868 0.8627451 0.74509804 0.68 0.68085106
0.86792453 0.8 0.84 0.8 ]
mean value: 0.7963950265774716
key: train_fscore
value: [0.79735683 0.79379157 0.80266075 0.7972973 0.75294118 0.80701754
0.79735683 0.84696017 0.80786026 0.80353201]
mean value: 0.8006774440728486
key: test_precision
value: [1. 0.88 1. 0.86363636 0.77272727 0.84210526
0.92 0.90909091 0.95454545 0.90909091]
mean value: 0.9051196172248803
key: train_precision
value: [0.905 0.90862944 0.91414141 0.92670157 0.93567251 0.91089109
0.905 0.9058296 0.90686275 0.91457286]
mean value: 0.9133301236007405
key: test_recall
value: [0.75 0.78571429 0.75862069 0.65517241 0.60714286 0.57142857
0.82142857 0.71428571 0.75 0.71428571]
mean value: 0.712807881773399
key: train_recall
value: [0.71259843 0.70472441 0.71541502 0.69960474 0.62992126 0.72440945
0.71259843 0.79527559 0.72834646 0.71653543]
mean value: 0.7139429211664747
key: test_roc_auc
value: [0.875 0.841133 0.87931034 0.77401478 0.71428571 0.73214286
0.875 0.82142857 0.85714286 0.82142857]
mean value: 0.8190886699507389
key: train_roc_auc
value: [0.81874981 0.81678908 0.82424294 0.82224332 0.79330709 0.82677165
0.81889764 0.85629921 0.82677165 0.82480315]
mean value: 0.8228875540755034
key: test_jcc
value: [0.75 0.70967742 0.75862069 0.59375 0.51515152 0.51612903
0.76666667 0.66666667 0.72413793 0.66666667]
mean value: 0.6667466587454074
key: train_jcc
value: [0.66300366 0.65808824 0.67037037 0.66292135 0.60377358 0.67647059
0.66300366 0.73454545 0.67765568 0.67158672]
mean value: 0.6681419301195666
MCC on Blind test: 0.34
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00911832 0.00889468 0.00884533 0.00869632 0.00868821 0.00798774
0.00814915 0.0086484 0.0087316 0.00859666]
mean value: 0.008635640144348145
key: score_time
value: [0.00931168 0.00890446 0.00889421 0.00877428 0.00848413 0.00837326
0.00868559 0.00825214 0.00880623 0.00885201]
mean value: 0.008733797073364257
key: test_mcc
value: [0.8953202 0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
0.67900461 0.75047877 0.64450339 0.82195294]
mean value: 0.766179444196459
key: train_mcc
value: [0.76340037 0.76340037 0.76353762 0.75544282 0.77564465 0.77588525
0.77564465 0.77991449 0.79149195 0.76800824]
mean value: 0.7712370421013379
key: test_accuracy
value: [0.94736842 0.9122807 0.92982456 0.85964912 0.85714286 0.875
0.83928571 0.875 0.82142857 0.91071429]
mean value: 0.8827694235588972
key: train_accuracy
value: [0.8816568 0.8816568 0.8816568 0.87771203 0.88779528 0.88779528
0.88779528 0.88976378 0.89566929 0.88385827]
mean value: 0.8855359611113699
key: test_fscore
value: [0.94736842 0.9122807 0.93103448 0.86206897 0.86206897 0.87272727
0.84210526 0.87272727 0.82758621 0.90909091]
mean value: 0.8839058461200022
key: train_fscore
value: [0.8828125 0.8828125 0.8828125 0.87698413 0.88845401 0.88932039
0.88845401 0.89147287 0.89668616 0.88543689]
mean value: 0.8865245960082
key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
0.82758621 0.88888889 0.8 0.92592593]
mean value: 0.8785312899106003
key: train_precision
value: [0.87596899 0.87596899 0.87258687 0.88047809 0.88326848 0.87739464
0.88326848 0.8778626 0.88803089 0.87356322]
mean value: 0.8788391247569809
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
0.85714286 0.85714286 0.85714286 0.89285714]
mean value: 0.8900246305418719
key: train_recall
value: [0.88976378 0.88976378 0.89328063 0.87351779 0.89370079 0.9015748
0.89370079 0.90551181 0.90551181 0.8976378 ]
mean value: 0.8943963773303041
key: test_roc_auc
value: [0.9476601 0.91256158 0.92980296 0.85960591 0.85714286 0.875
0.83928571 0.875 0.82142857 0.91071429]
mean value: 0.882820197044335
key: train_roc_auc
value: [0.88164078 0.88164078 0.88167969 0.87770378 0.88779528 0.88779528
0.88779528 0.88976378 0.89566929 0.88385827]
mean value: 0.8855342192897825
key: test_jcc
value: [0.9 0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
0.72727273 0.77419355 0.70588235 0.83333333]
mean value: 0.7939704444827784
key: train_jcc
value: [0.79020979 0.79020979 0.79020979 0.78091873 0.79929577 0.8006993
0.79929577 0.8041958 0.81272085 0.79442509]
mean value: 0.7962180687899996
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00799131 0.0081377 0.00847793 0.00799417 0.00966215 0.01012588
0.00796223 0.00843406 0.00802898 0.00803018]
mean value: 0.008484458923339844
key: score_time
value: [0.01276636 0.0125227 0.01176548 0.0133183 0.01890969 0.01576591
0.01556277 0.01163697 0.01146603 0.01169634]
mean value: 0.013541054725646973
key: test_mcc
value: [0.8953202 0.78940887 0.71921182 0.79110556 0.75047877 0.68250015
0.60753044 0.75047877 0.58501794 0.82195294]
mean value: 0.7393005465274064
key: train_mcc
value: [0.78707279 0.78304441 0.77919572 0.79093074 0.79951627 0.78742599
0.80317451 0.80759374 0.79936749 0.78395685]
mean value: 0.79212785011907
key: test_accuracy
value: [0.94736842 0.89473684 0.85964912 0.89473684 0.875 0.83928571
0.80357143 0.875 0.78571429 0.91071429]
mean value: 0.868577694235589
key: train_accuracy
value: [0.89349112 0.89151874 0.88954635 0.89546351 0.8996063 0.89370079
0.9015748 0.90354331 0.8996063 0.89173228]
mean value: 0.8959783503393437
key: test_fscore
value: [0.94736842 0.89285714 0.86206897 0.9 0.87719298 0.83018868
0.80701754 0.87272727 0.80645161 0.90909091]
mean value: 0.8704963529709496
key: train_fscore
value: [0.89453125 0.89151874 0.89019608 0.8950495 0.90097087 0.89411765
0.90196078 0.90522244 0.9005848 0.89361702]
mean value: 0.8967769129948973
key: test_precision
value: [0.93103448 0.89285714 0.86206897 0.87096774 0.86206897 0.88
0.79310345 0.88888889 0.73529412 0.92592593]
mean value: 0.8642209679323466
key: train_precision
value: [0.8875969 0.89328063 0.88326848 0.8968254 0.88888889 0.890625
0.8984375 0.88973384 0.89189189 0.878327 ]
mean value: 0.8898875528234225
key: test_recall
value: [0.96428571 0.89285714 0.86206897 0.93103448 0.89285714 0.78571429
0.82142857 0.85714286 0.89285714 0.89285714]
mean value: 0.8793103448275862
key: train_recall
value: [0.9015748 0.88976378 0.8972332 0.89328063 0.91338583 0.8976378
0.90551181 0.92125984 0.90944882 0.90944882]
mean value: 0.9038545330055087
key: test_roc_auc
value: [0.9476601 0.89470443 0.85960591 0.89408867 0.875 0.83928571
0.80357143 0.875 0.78571429 0.91071429]
mean value: 0.8685344827586207
key: train_roc_auc
value: [0.89347515 0.89152221 0.88956148 0.89545921 0.8996063 0.89370079
0.9015748 0.90354331 0.8996063 0.89173228]
mean value: 0.8959781830630855
key: test_jcc
value: [0.9 0.80645161 0.75757576 0.81818182 0.78125 0.70967742
0.67647059 0.77419355 0.67567568 0.83333333]
mean value: 0.7732809753647041
key: train_jcc
value: [0.80918728 0.80427046 0.80212014 0.81003584 0.81978799 0.80851064
0.82142857 0.82685512 0.81914894 0.80769231]
mean value: 0.8129037288551658
MCC on Blind test: 0.25
Accuracy on Blind test: 0.72
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01498389 0.01484656 0.01499653 0.01508522 0.01529217 0.01519728
0.0148685 0.01477909 0.01449656 0.01571655]
mean value: 0.015026235580444336
key: score_time
value: [0.00945044 0.00991273 0.00937939 0.00918961 0.0093348 0.00939512
0.00931787 0.00938892 0.0092535 0.00957513]
mean value: 0.009419751167297364
key: test_mcc
value: [0.8953202 0.8953202 0.85960591 0.75462449 0.71611487 0.78772636
0.67900461 0.75047877 0.64450339 0.78772636]
mean value: 0.7770425163515529
key: train_mcc
value: [0.77929987 0.77929987 0.78334713 0.79108822 0.79936749 0.78779242
0.80337378 0.79567034 0.80324922 0.77974514]
mean value: 0.7902233465851163
key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.87719298 0.85714286 0.89285714
0.83928571 0.875 0.82142857 0.89285714]
mean value: 0.8880325814536341
key: train_accuracy
value: [0.88954635 0.88954635 0.89151874 0.89546351 0.8996063 0.89370079
0.9015748 0.8976378 0.9015748 0.88976378]
mean value: 0.894993321840687
key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.88135593 0.86206897 0.88888889
0.84210526 0.87272727 0.82758621 0.88888889]
mean value: 0.8889392743144012
key: train_fscore
value: [0.89105058 0.89105058 0.89278752 0.8962818 0.9005848 0.89534884
0.90272374 0.89922481 0.90234375 0.89105058]
mean value: 0.8962446999871675
key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.86666667 0.83333333 0.92307692
0.82758621 0.88888889 0.8 0.92307692]
mean value: 0.8855732390215149
key: train_precision
value: [0.88076923 0.88076923 0.88076923 0.8875969 0.89189189 0.88167939
0.89230769 0.88549618 0.89534884 0.88076923]
mean value: 0.88573978162297
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
0.85714286 0.85714286 0.85714286 0.85714286]
mean value: 0.8934729064039408
key: train_recall
value: [0.9015748 0.9015748 0.90513834 0.90513834 0.90944882 0.90944882
0.91338583 0.91338583 0.90944882 0.9015748 ]
mean value: 0.9070119199526937
key: test_roc_auc
value: [0.9476601 0.9476601 0.92980296 0.87684729 0.85714286 0.89285714
0.83928571 0.875 0.82142857 0.89285714]
mean value: 0.8880541871921183
key: train_roc_auc
value: [0.88952258 0.88952258 0.89154555 0.89548256 0.8996063 0.89370079
0.9015748 0.8976378 0.9015748 0.88976378]
mean value: 0.8949931530297843
key: test_jcc
value: [0.9 0.9 0.87096774 0.78787879 0.75757576 0.8
0.72727273 0.77419355 0.70588235 0.8 ]
mean value: 0.802377091599103
key: train_jcc
value: [0.80350877 0.80350877 0.80633803 0.81205674 0.81914894 0.81052632
0.82269504 0.81690141 0.82206406 0.80350877]
mean value: 0.8120256834358026
MCC on Blind test: 0.22
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.55003285 1.53659248 1.37875795 1.49586344 1.57297134 1.42394137
1.5258956 1.60407376 1.39470887 1.90794826]
mean value: 1.5390785932540894
key: score_time
value: [0.01411986 0.01391315 0.01411891 0.01399922 0.01386738 0.01426673
0.01415634 0.01177144 0.01455188 0.01436853]
mean value: 0.013913345336914063
key: test_mcc
value: [0.8951918 0.92980296 0.82490815 0.8953202 0.75047877 0.89802651
0.89342711 0.78772636 0.78772636 0.85714286]
mean value: 0.851975108089572
key: train_mcc
value: [0.97245522 0.96055211 0.97239383 0.96055211 0.97250878 0.9645744
0.96463421 0.96850394 0.9645744 0.9645744 ]
mean value: 0.9665323428042959
key: test_accuracy
value: [0.94736842 0.96491228 0.9122807 0.94736842 0.875 0.94642857
0.94642857 0.89285714 0.89285714 0.92857143]
mean value: 0.9254072681704261
key: train_accuracy
value: [0.98619329 0.98027613 0.98619329 0.98027613 0.98622047 0.98228346
0.98228346 0.98425197 0.98228346 0.98228346]
mean value: 0.9832545155228378
key: test_fscore
value: [0.94545455 0.96428571 0.91525424 0.94736842 0.87719298 0.94339623
0.94736842 0.88888889 0.89655172 0.92857143]
mean value: 0.9254332589603143
key: train_fscore
value: [0.98613861 0.98031496 0.98613861 0.98023715 0.98613861 0.98224852
0.98217822 0.98425197 0.98224852 0.98231827]
mean value: 0.9832213455229958
key: test_precision
value: [0.96296296 0.96428571 0.9 0.96428571 0.86206897 1.
0.93103448 0.92307692 0.86666667 0.92857143]
mean value: 0.9302952858125272
key: train_precision
value: [0.99203187 0.98031496 0.98809524 0.98023715 0.99203187 0.98418972
0.98804781 0.98425197 0.98418972 0.98039216]
mean value: 0.9853782478667216
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
0.96428571 0.85714286 0.92857143 0.92857143]
mean value: 0.9219211822660098
key: train_recall
value: [0.98031496 0.98031496 0.98418972 0.98023715 0.98031496 0.98031496
0.97637795 0.98425197 0.98031496 0.98425197]
mean value: 0.9810883570383742
key: test_roc_auc
value: [0.94704433 0.96490148 0.91194581 0.9476601 0.875 0.94642857
0.94642857 0.89285714 0.89285714 0.92857143]
mean value: 0.9253694581280789
key: train_roc_auc
value: [0.98620491 0.98027606 0.98618935 0.98027606 0.98622047 0.98228346
0.98228346 0.98425197 0.98228346 0.98228346]
mean value: 0.9832552674986773
key: test_jcc
value: [0.89655172 0.93103448 0.84375 0.9 0.78125 0.89285714
0.9 0.8 0.8125 0.86666667]
mean value: 0.8624610016420361
key: train_jcc
value: [0.97265625 0.96138996 0.97265625 0.96124031 0.97265625 0.96511628
0.96498054 0.96899225 0.96511628 0.96525097]
mean value: 0.9670055337667078
MCC on Blind test: 0.26
Accuracy on Blind test: 0.66
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01483965 0.0120194 0.01096416 0.01053548 0.01014924 0.01097488
0.01111197 0.01054001 0.01057744 0.01146078]
mean value: 0.011317300796508788
key: score_time
value: [0.01083517 0.00851774 0.00850797 0.00900292 0.00824666 0.00808549
0.0081172 0.00822592 0.00818729 0.00816345]
mean value: 0.00858898162841797
key: test_mcc
value: [0.93202124 0.8951918 0.85960591 0.8953202 0.75434227 0.96490128
0.75434227 0.89342711 0.96490128 0.92857143]
mean value: 0.8842624793067261
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.94736842 0.92982456 0.94736842 0.875 0.98214286
0.875 0.94642857 0.98214286 0.96428571]
mean value: 0.9414473684210526
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.94545455 0.93103448 0.94736842 0.88135593 0.98181818
0.88135593 0.94736842 0.98181818 0.96428571]
mean value: 0.942482277561025
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.93103448 0.96428571 0.83870968 1.
0.83870968 0.93103448 1. 0.96428571]
mean value: 0.9431022711890342
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.92857143 0.93103448 0.93103448 0.92857143 0.96428571
0.92857143 0.96428571 0.96428571 0.96428571]
mean value: 0.9433497536945813
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.94704433 0.92980296 0.9476601 0.875 0.98214286
0.875 0.94642857 0.98214286 0.96428571]
mean value: 0.9413793103448276
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.89655172 0.87096774 0.9 0.78787879 0.96428571
0.78787879 0.9 0.96428571 0.93103448]
mean value: 0.8931454381732469
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.36
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10266161 0.12123942 0.11855769 0.10826373 0.11115503 0.13033295
0.11559772 0.10858154 0.10145831 0.10418749]
mean value: 0.11220355033874511
key: score_time
value: [0.01758289 0.02243209 0.02058554 0.02079964 0.02117038 0.01818752
0.01786613 0.01750755 0.01726437 0.01872659]
mean value: 0.01921226978302002
key: test_mcc
value: [0.92980296 0.86189955 0.85960591 0.82490815 0.85714286 0.89342711
0.92857143 0.82195294 0.78571429 0.92857143]
mean value: 0.8691596616885752
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.92982456 0.92982456 0.9122807 0.92857143 0.94642857
0.96428571 0.91071429 0.89285714 0.96428571]
mean value: 0.9343984962406016
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.93103448 0.93103448 0.91525424 0.92857143 0.94736842
0.96428571 0.90909091 0.89285714 0.96428571]
mean value: 0.9348068247234632
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.9 0.93103448 0.9 0.92857143 0.93103448
0.96428571 0.92592593 0.89285714 0.96428571]
mean value: 0.9302280605728882
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.96428571
0.96428571 0.89285714 0.89285714 0.96428571]
mean value: 0.9397783251231527
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.93041872 0.92980296 0.91194581 0.92857143 0.94642857
0.96428571 0.91071429 0.89285714 0.96428571]
mean value: 0.9344211822660099
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.87096774 0.87096774 0.84375 0.86666667 0.9
0.93103448 0.83333333 0.80645161 0.93103448]
mean value: 0.8785240545050056
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.71
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00810385 0.00779867 0.00782776 0.0077281 0.00779986 0.00821972
0.00770831 0.0078783 0.00787997 0.00846457]
mean value: 0.007940912246704101
key: score_time
value: [0.00810742 0.00859261 0.00797725 0.00838828 0.00801611 0.0083127
0.00793481 0.00823236 0.00814414 0.00799465]
mean value: 0.008170032501220703
key: test_mcc
value: [0.8951918 0.78940887 0.68472906 0.8615634 0.5118907 0.65814518
0.89342711 0.75434227 0.85933785 0.92857143]
mean value: 0.7836607672751627
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.89473684 0.84210526 0.92982456 0.75 0.82142857
0.94642857 0.875 0.92857143 0.96428571]
mean value: 0.8899749373433584
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94545455 0.89285714 0.84210526 0.93333333 0.77419355 0.8
0.94545455 0.86792453 0.92592593 0.96428571]
mean value: 0.8891534547158085
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 0.89285714 0.85714286 0.90322581 0.70588235 0.90909091
0.96296296 0.92 0.96153846 0.96428571]
mean value: 0.90399491702338
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.89285714 0.82758621 0.96551724 0.85714286 0.71428571
0.92857143 0.82142857 0.89285714 0.96428571]
mean value: 0.8793103448275862
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94704433 0.89470443 0.84236453 0.92918719 0.75 0.82142857
0.94642857 0.875 0.92857143 0.96428571]
mean value: 0.8899014778325124
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89655172 0.80645161 0.72727273 0.875 0.63157895 0.66666667
0.89655172 0.76666667 0.86206897 0.93103448]
mean value: 0.8059843517429431
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.8
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.33799982 1.31312704 1.34472799 1.3418479 1.33652711 1.34488511
1.34392357 1.36048555 1.3535583 1.39411759]
mean value: 1.3471199989318847
key: score_time
value: [0.09937048 0.09200263 0.0983386 0.09501576 0.09319186 0.09341598
0.09465718 0.09849429 0.09647918 0.09067464]
mean value: 0.09516406059265137
key: test_mcc
value: [0.96547546 0.8953202 0.92980296 0.8951918 0.85933785 1.
0.92857143 0.89342711 0.93094934 0.92857143]
mean value: 0.922664756643307
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.9609962406015038
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.93103448 1.
0.96428571 0.94736842 0.96296296 0.96428571]
mean value: 0.961379368196865
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.96551724 0.93333333 0.9 1.
0.96428571 0.93103448 1. 0.96428571]
mean value: 0.9589490968801314
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.96428571 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.9645320197044335
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.9476601 0.96490148 0.94704433 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.9 0.93333333 0.90322581 0.87096774 1.
0.93103448 0.9 0.92857143 0.93103448]
mean value: 0.9262452990094814
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.19
Accuracy on Blind test: 0.49
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.92916346 0.96192503 0.91043425 0.91636205 0.90995216 0.86325049
0.90527892 0.94523883 0.87693882 0.92262363]
mean value: 0.9141167640686035
key: score_time
value: [0.27692175 0.23238063 0.2464447 0.29479647 0.18034601 0.23226142
0.23399925 0.25210285 0.25105858 0.17118669]
mean value: 0.23714983463287354
key: test_mcc
value: [0.96547546 0.8953202 0.92980296 0.8951918 0.85714286 1.
0.92857143 0.89342711 0.93094934 0.92857143]
mean value: 0.9224452574728608
key: train_mcc
value: [0.94503515 0.95277969 0.94878539 0.95278262 0.95687833 0.94112724
0.94888508 0.95278544 0.94499908 0.94900279]
mean value: 0.9493060812767512
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.9609962406015038
key: train_accuracy
value: [0.97238659 0.97633136 0.97435897 0.97633136 0.97834646 0.97047244
0.97440945 0.97637795 0.97244094 0.97440945]
mean value: 0.9745864976937054
key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.92857143 1.
0.96428571 0.94736842 0.96296296 0.96428571]
mean value: 0.9611330627781457
key: train_fscore
value: [0.97276265 0.9765625 0.97445972 0.97647059 0.9785575 0.97076023
0.97455969 0.97647059 0.97265625 0.97465887]
mean value: 0.9747918592411458
key: test_precision
value: [1. 0.93103448 0.96551724 0.93333333 0.92857143 1.
0.96428571 0.93103448 1. 0.96428571]
mean value: 0.9618062397372742
key: train_precision
value: [0.96153846 0.96899225 0.96875 0.9688716 0.96911197 0.96138996
0.9688716 0.97265625 0.96511628 0.96525097]
mean value: 0.9670549325084619
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.92857143 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.9609605911330049
key: train_recall
value: [0.98425197 0.98425197 0.98023715 0.98418972 0.98818898 0.98031496
0.98031496 0.98031496 0.98031496 0.98425197]
mean value: 0.9826631601879805
key: test_roc_auc
value: [0.98214286 0.9476601 0.96490148 0.94704433 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [0.97236314 0.97631571 0.97437055 0.97634683 0.97834646 0.97047244
0.97440945 0.97637795 0.97244094 0.97440945]
mean value: 0.974585291463073
key: test_jcc
value: [0.96428571 0.9 0.93333333 0.90322581 0.86666667 1.
0.93103448 0.9 0.92857143 0.93103448]
mean value: 0.9258151914825997
key: train_jcc
value: [0.9469697 0.95419847 0.95019157 0.95402299 0.95801527 0.94318182
0.95038168 0.95402299 0.94676806 0.95057034]
mean value: 0.9508322885933389
MCC on Blind test: 0.2
Accuracy on Blind test: 0.5
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00826478 0.00817037 0.00865364 0.00832152 0.00791717 0.00800776
0.00859976 0.00781083 0.00889087 0.00782251]
mean value: 0.00824592113494873
key: score_time
value: [0.00801754 0.00853825 0.01080704 0.00824523 0.00855088 0.00804639
0.00831342 0.00840735 0.00844622 0.00819445]
mean value: 0.008556675910949708
key: test_mcc
value: [0.8953202 0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
0.67900461 0.75047877 0.64450339 0.82195294]
mean value: 0.766179444196459
key: train_mcc
value: [0.76340037 0.76340037 0.76353762 0.75544282 0.77564465 0.77588525
0.77564465 0.77991449 0.79149195 0.76800824]
mean value: 0.7712370421013379
key: test_accuracy
value: [0.94736842 0.9122807 0.92982456 0.85964912 0.85714286 0.875
0.83928571 0.875 0.82142857 0.91071429]
mean value: 0.8827694235588972
key: train_accuracy
value: [0.8816568 0.8816568 0.8816568 0.87771203 0.88779528 0.88779528
0.88779528 0.88976378 0.89566929 0.88385827]
mean value: 0.8855359611113699
key: test_fscore
value: [0.94736842 0.9122807 0.93103448 0.86206897 0.86206897 0.87272727
0.84210526 0.87272727 0.82758621 0.90909091]
mean value: 0.8839058461200022
key: train_fscore
value: [0.8828125 0.8828125 0.8828125 0.87698413 0.88845401 0.88932039
0.88845401 0.89147287 0.89668616 0.88543689]
mean value: 0.8865245960082
key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
0.82758621 0.88888889 0.8 0.92592593]
mean value: 0.8785312899106003
key: train_precision
value: [0.87596899 0.87596899 0.87258687 0.88047809 0.88326848 0.87739464
0.88326848 0.8778626 0.88803089 0.87356322]
mean value: 0.8788391247569809
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
0.85714286 0.85714286 0.85714286 0.89285714]
mean value: 0.8900246305418719
key: train_recall
value: [0.88976378 0.88976378 0.89328063 0.87351779 0.89370079 0.9015748
0.89370079 0.90551181 0.90551181 0.8976378 ]
mean value: 0.8943963773303041
key: test_roc_auc
value: [0.9476601 0.91256158 0.92980296 0.85960591 0.85714286 0.875
0.83928571 0.875 0.82142857 0.91071429]
mean value: 0.882820197044335
key: train_roc_auc
value: [0.88164078 0.88164078 0.88167969 0.87770378 0.88779528 0.88779528
0.88779528 0.88976378 0.89566929 0.88385827]
mean value: 0.8855342192897825
key: test_jcc
value: [0.9 0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
0.72727273 0.77419355 0.70588235 0.83333333]
mean value: 0.7939704444827784
key: train_jcc
value: [0.79020979 0.79020979 0.79020979 0.78091873 0.79929577 0.8006993
0.79929577 0.8041958 0.81272085 0.79442509]
mean value: 0.7962180687899996
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.07053018 0.055233 0.05712581 0.05738688 0.05596948 0.2296176
0.04953313 0.04858136 0.05947566 0.05467701]
mean value: 0.07381300926208496
key: score_time
value: [0.01033711 0.01042008 0.01019645 0.01021481 0.01025701 0.01056623
0.01241708 0.00986052 0.01005435 0.01003385]
mean value: 0.010435748100280761
key: test_mcc
value: [0.96547546 0.92980296 0.96547546 0.96547546 0.89342711 1.
0.96490128 0.89342711 0.96490128 0.92857143]
mean value: 0.9471457541234694
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.98245614 0.98245614 0.94642857 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.9733709273182957
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.98305085 0.98305085 0.94736842 1.
0.98245614 0.94736842 0.98181818 0.96428571]
mean value: 0.9735502469579187
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96666667 0.96666667 0.93103448 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9689490968801313
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 1. 1. 0.96428571 1.
1. 0.96428571 0.96428571 0.96428571]
mean value: 0.9785714285714285
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.98214286 0.98214286 0.94642857 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.9732758620689657
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.96666667 0.96666667 0.9 1.
0.96551724 0.9 0.96428571 0.93103448]
mean value: 0.9489490968801314
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.36
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01379895 0.04094553 0.04091215 0.04156613 0.04213095 0.04107809
0.04116917 0.04115582 0.03444266 0.04156756]
mean value: 0.03787670135498047
key: score_time
value: [0.01027441 0.021981 0.01984763 0.02085829 0.02091908 0.02200484
0.01950645 0.01698136 0.02125549 0.01091409]
mean value: 0.018454265594482423
key: test_mcc
value: [0.85960591 0.8953202 0.85960591 0.82490815 0.75434227 0.78772636
0.75434227 0.71611487 0.68250015 0.82195294]
mean value: 0.7956419031963872
key: train_mcc
value: [0.86611359 0.85893744 0.84648438 0.84263794 0.86253233 0.85105352
0.83910959 0.85105352 0.85545187 0.83890131]
mean value: 0.8512275503641218
key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807 0.875 0.89285714
0.875 0.85714286 0.83928571 0.91071429]
mean value: 0.8969298245614035
key: train_accuracy
value: [0.93293886 0.92899408 0.92307692 0.92110454 0.93110236 0.92519685
0.91929134 0.92519685 0.92716535 0.91929134]
mean value: 0.925335849291028
key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.88888889
0.88135593 0.85185185 0.84745763 0.90909091]
mean value: 0.898222971102789
key: train_fscore
value: [0.93385214 0.93076923 0.92397661 0.92217899 0.93203883 0.92664093
0.92069632 0.92664093 0.92898273 0.92038835]
mean value: 0.9266165055588382
key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9 0.83870968 0.92307692
0.83870968 0.88461538 0.80645161 0.92592593]
mean value: 0.8908129595448839
key: train_precision
value: [0.92307692 0.90977444 0.91153846 0.90804598 0.91954023 0.90909091
0.90494297 0.90909091 0.90636704 0.90804598]
mean value: 0.9109513829773443
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.85714286
0.92857143 0.82142857 0.89285714 0.89285714]
mean value: 0.9076354679802956
key: train_recall
value: [0.94488189 0.95275591 0.93675889 0.93675889 0.94488189 0.94488189
0.93700787 0.94488189 0.95275591 0.93307087]
mean value: 0.9428635896797485
key: test_roc_auc
value: [0.92980296 0.9476601 0.92980296 0.91194581 0.875 0.89285714
0.875 0.85714286 0.83928571 0.91071429]
mean value: 0.8969211822660099
key: train_roc_auc
value: [0.93291525 0.92894712 0.92310386 0.92113535 0.93110236 0.92519685
0.91929134 0.92519685 0.92716535 0.91929134]
mean value: 0.9253345678628117
key: test_jcc
value: [0.86666667 0.9 0.87096774 0.84375 0.78787879 0.8
0.78787879 0.74193548 0.73529412 0.83333333]
mean value: 0.8167704919211086
key: train_jcc
value: [0.87591241 0.8705036 0.85869565 0.85559567 0.87272727 0.86330935
0.85304659 0.86330935 0.86738351 0.85251799]
mean value: 0.8633001396827011
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02298999 0.00778127 0.00742817 0.00751019 0.00740218 0.00754929
0.00757813 0.00776744 0.00745249 0.00765824]
mean value: 0.009111738204956055
key: score_time
value: [0.00836825 0.00810742 0.00794792 0.00793934 0.00797534 0.0079298
0.00799894 0.00800514 0.00802684 0.00787163]
mean value: 0.00801706314086914
key: test_mcc
value: [0.8953202 0.82512315 0.85960591 0.78940887 0.71611487 0.75047877
0.67900461 0.75047877 0.64450339 0.82195294]
mean value: 0.7731991486299565
key: train_mcc
value: [0.76340037 0.76340037 0.76741581 0.77919572 0.78351922 0.77588525
0.78749923 0.78361641 0.79139378 0.77186893]
mean value: 0.776719508855672
key: test_accuracy
value: [0.94736842 0.9122807 0.92982456 0.89473684 0.85714286 0.875
0.83928571 0.875 0.82142857 0.91071429]
mean value: 0.8862781954887218
key: train_accuracy
value: [0.8816568 0.8816568 0.88362919 0.88954635 0.89173228 0.88779528
0.89370079 0.89173228 0.89566929 0.88582677]
mean value: 0.8882945844787152
key: test_fscore
value: [0.94736842 0.9122807 0.93103448 0.89655172 0.86206897 0.87272727
0.84210526 0.87272727 0.82758621 0.90909091]
mean value: 0.8873541219820712
key: train_fscore
value: [0.8828125 0.8828125 0.88454012 0.89019608 0.89236791 0.88932039
0.89453125 0.89278752 0.8962818 0.88715953]
mean value: 0.8892809598096044
key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.89655172 0.83333333 0.88888889
0.82758621 0.88888889 0.8 0.92592593]
mean value: 0.8819795657726692
key: train_precision
value: [0.87596899 0.87596899 0.87596899 0.88326848 0.88715953 0.87739464
0.8875969 0.88416988 0.89105058 0.87692308]
mean value: 0.8815470072299069
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286
0.85714286 0.85714286 0.85714286 0.89285714]
mean value: 0.8934729064039408
key: train_recall
value: [0.88976378 0.88976378 0.89328063 0.8972332 0.8976378 0.9015748
0.9015748 0.9015748 0.9015748 0.8976378 ]
mean value: 0.8971616196196819
key: test_roc_auc
value: [0.9476601 0.91256158 0.92980296 0.89470443 0.85714286 0.875
0.83928571 0.875 0.82142857 0.91071429]
mean value: 0.8863300492610838
key: train_roc_auc
value: [0.88164078 0.88164078 0.88364819 0.88956148 0.89173228 0.88779528
0.89370079 0.89173228 0.89566929 0.88582677]
mean value: 0.8882947931903769
key: test_jcc
value: [0.9 0.83870968 0.87096774 0.8125 0.75757576 0.77419355
0.72727273 0.77419355 0.70588235 0.83333333]
mean value: 0.7994628687252027
key: train_jcc
value: [0.79020979 0.79020979 0.79298246 0.80212014 0.80565371 0.8006993
0.80918728 0.80633803 0.81205674 0.7972028 ]
mean value: 0.8006660030961745
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00975227 0.01262665 0.01330996 0.01325607 0.01203847 0.01280475
0.01180601 0.01305366 0.01356101 0.01355767]
mean value: 0.012576651573181153
key: score_time
value: [0.00793934 0.01009583 0.01011109 0.01049399 0.01059914 0.01051426
0.01052403 0.0104363 0.01050282 0.01055479]
mean value: 0.010177159309387207
key: test_mcc
value: [0.93202124 0.8953202 0.89952865 0.86189955 0.75047877 0.93094934
0.79385662 0.78571429 0.56573571 0.78571429]
mean value: 0.8201218647363345
key: train_mcc
value: [0.90138807 0.90933566 0.85396037 0.85053095 0.9021413 0.91064232
0.84093872 0.88232751 0.83427977 0.86279984]
mean value: 0.8748344497092373
key: test_accuracy
value: [0.96491228 0.94736842 0.94736842 0.92982456 0.875 0.96428571
0.89285714 0.89285714 0.76785714 0.89285714]
mean value: 0.9075187969924812
key: train_accuracy
value: [0.95069034 0.95463511 0.92504931 0.92504931 0.9507874 0.95472441
0.91929134 0.94094488 0.91338583 0.92913386]
mean value: 0.9363691779651804
key: test_fscore
value: [0.96296296 0.94736842 0.95081967 0.92857143 0.87272727 0.96296296
0.9 0.89285714 0.8 0.89285714]
mean value: 0.9111127006122692
key: train_fscore
value: [0.95069034 0.95445545 0.92830189 0.92607004 0.9498998 0.95353535
0.92220114 0.94186047 0.91881919 0.93258427]
mean value: 0.9378417921178791
key: test_precision
value: [1. 0.93103448 0.90625 0.96296296 0.88888889 1.
0.84375 0.89285714 0.7027027 0.89285714]
mean value: 0.9021303323027461
key: train_precision
value: [0.95256917 0.96015936 0.88808664 0.91187739 0.96734694 0.97925311
0.89010989 0.92748092 0.86458333 0.88928571]
mean value: 0.9230752474313746
key: test_recall
value: [0.92857143 0.96428571 1. 0.89655172 0.85714286 0.92857143
0.96428571 0.89285714 0.92857143 0.89285714]
mean value: 0.9253694581280788
key: train_recall
value: [0.9488189 0.9488189 0.97233202 0.94071146 0.93307087 0.92913386
0.95669291 0.95669291 0.98031496 0.98031496]
mean value: 0.9546901745977405
key: test_roc_auc
value: [0.96428571 0.9476601 0.94642857 0.93041872 0.875 0.96428571
0.89285714 0.89285714 0.76785714 0.89285714]
mean value: 0.9074507389162563
key: train_roc_auc
value: [0.95069403 0.9546466 0.92514239 0.92508014 0.9507874 0.95472441
0.91929134 0.94094488 0.91338583 0.92913386]
mean value: 0.9363830879835673
key: test_jcc
value: [0.92857143 0.9 0.90625 0.86666667 0.77419355 0.92857143
0.81818182 0.80645161 0.66666667 0.80645161]
mean value: 0.8402004782851558
key: train_jcc
value: [0.90601504 0.91287879 0.86619718 0.86231884 0.90458015 0.91119691
0.8556338 0.89010989 0.84982935 0.87368421]
mean value: 0.8832444168008685
MCC on Blind test: 0.16
Accuracy on Blind test: 0.45
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01402736 0.01417279 0.01328135 0.01392245 0.01277757 0.01429009
0.01231146 0.01305246 0.01252604 0.01286674]
mean value: 0.013322830200195312
key: score_time
value: [0.0107367 0.0107584 0.01064038 0.01052928 0.01053858 0.01060414
0.01047802 0.01054215 0.01091051 0.01058197]
mean value: 0.010632014274597168
key: test_mcc
value: [0.93202124 0.83703659 0.82942474 0.82490815 0.64951905 0.93094934
0.82195294 0.85714286 0.59628479 0.92857143]
mean value: 0.8207811130466791
key: train_mcc
value: [0.89234379 0.81176962 0.87340231 0.8905544 0.89426234 0.88323242
0.90174953 0.91732994 0.87948771 0.89075842]
mean value: 0.8834890481349534
key: test_accuracy
value: [0.96491228 0.9122807 0.9122807 0.9122807 0.82142857 0.96428571
0.91071429 0.92857143 0.78571429 0.96428571]
mean value: 0.9076754385964912
key: train_accuracy
value: [0.94477318 0.89940828 0.93491124 0.94477318 0.94685039 0.94094488
0.9507874 0.95866142 0.93897638 0.94488189]
mean value: 0.9404968239916756
key: test_fscore
value: [0.96296296 0.90196078 0.90909091 0.91525424 0.83333333 0.96551724
0.9122807 0.92857143 0.8125 0.96428571]
mean value: 0.9105757312979906
key: train_fscore
value: [0.94262295 0.88984881 0.93167702 0.94594595 0.94777563 0.94252874
0.95126706 0.95874263 0.94072658 0.94615385]
mean value: 0.9397289204487953
key: test_precision
value: [1. 1. 0.96153846 0.9 0.78125 0.93333333
0.89655172 0.92857143 0.72222222 0.96428571]
mean value: 0.9087752884089091
key: train_precision
value: [0.98290598 0.98564593 0.97826087 0.9245283 0.93155894 0.91791045
0.94208494 0.95686275 0.91449814 0.92481203]
mean value: 0.9459068329016868
key: test_recall
value: [0.92857143 0.82142857 0.86206897 0.93103448 0.89285714 1.
0.92857143 0.92857143 0.92857143 0.96428571]
mean value: 0.9185960591133004
key: train_recall
value: [0.90551181 0.81102362 0.88932806 0.96837945 0.96456693 0.96850394
0.96062992 0.96062992 0.96850394 0.96850394]
mean value: 0.9365581525629454
key: test_roc_auc
value: [0.96428571 0.91071429 0.91317734 0.91194581 0.82142857 0.96428571
0.91071429 0.92857143 0.78571429 0.96428571]
mean value: 0.907512315270936
key: train_roc_auc
value: [0.94485077 0.89958296 0.93482151 0.94481964 0.94685039 0.94094488
0.9507874 0.95866142 0.93897638 0.94488189]
mean value: 0.940517724316081
key: test_jcc
value: [0.92857143 0.82142857 0.83333333 0.84375 0.71428571 0.93333333
0.83870968 0.86666667 0.68421053 0.93103448]
mean value: 0.8395323734112813
key: train_jcc
value: [0.89147287 0.80155642 0.87209302 0.8974359 0.90073529 0.89130435
0.9070632 0.92075472 0.88808664 0.89781022]
mean value: 0.8868312626670497
MCC on Blind test: 0.23
Accuracy on Blind test: 0.67
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.10967422 0.09465766 0.0946455 0.09465408 0.09431386 0.09605312
0.09578514 0.09517407 0.09609222 0.09636235]
mean value: 0.09674122333526611
key: score_time
value: [0.01450157 0.01411438 0.01444912 0.0144279 0.0141592 0.01532269
0.01432991 0.01458526 0.01432729 0.01431394]
mean value: 0.014453125
key: test_mcc
value: [0.93202124 0.92980296 0.8953202 0.93202124 0.82618439 0.96490128
0.96490128 0.89342711 0.96490128 0.89342711]
mean value: 0.919690809367707
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.96491228 0.94736842 0.96491228 0.91071429 0.98214286
0.98214286 0.94642857 0.98214286 0.94642857]
mean value: 0.9592105263157894
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.96428571 0.94736842 0.96666667 0.91525424 0.98181818
0.98245614 0.94736842 0.98181818 0.94545455]
mean value: 0.9595453472750529
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96428571 0.93548387 0.87096774 1.
0.96551724 0.93103448 1. 0.96296296]
mean value: 0.9594537728575548
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.96428571 0.93103448 1. 0.96428571 0.96428571
1. 0.96428571 0.96428571 0.92857143]
mean value: 0.9609605911330049
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.96490148 0.9476601 0.96428571 0.91071429 0.98214286
0.98214286 0.94642857 0.98214286 0.94642857]
mean value: 0.9591133004926109
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.93103448 0.9 0.93548387 0.84375 0.96428571
0.96551724 0.9 0.96428571 0.89655172]
mean value: 0.9229480176386461
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.14
Accuracy on Blind test: 0.43
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03699589 0.04877973 0.05854607 0.05341744 0.04653335 0.03373337
0.049371 0.03414083 0.04613113 0.05712962]
mean value: 0.04647784233093262
key: score_time
value: [0.02744269 0.02593279 0.03697324 0.0348525 0.0197053 0.0282557
0.0171566 0.02019095 0.02783036 0.03664637]
mean value: 0.027498650550842284
key: test_mcc
value: [0.96547546 0.92980296 0.8953202 0.93202124 0.82618439 1.
0.96490128 0.89342711 0.93094934 0.92857143]
mean value: 0.9266653398520664
key: train_mcc
value: [0.99214142 0.99211042 0.99214118 1. 0.99212598 0.98428248
0.98825791 1. 0.99212598 0.98819663]
mean value: 0.9921382021238081
key: test_accuracy
value: [0.98245614 0.96491228 0.94736842 0.96491228 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9627506265664161
key: train_accuracy
value: [0.99605523 0.99605523 0.99605523 1. 0.99606299 0.99212598
0.99409449 1. 0.99606299 0.99409449]
mean value: 0.9960606625355263
key: test_fscore
value: [0.98181818 0.96428571 0.94736842 0.96666667 0.91525424 1.
0.98245614 0.94736842 0.96296296 0.96428571]
mean value: 0.9632466459763516
key: train_fscore
value: [0.99604743 0.99606299 0.99603175 1. 0.99606299 0.99209486
0.99405941 1. 0.99606299 0.99408284]
mean value: 0.9960505261077098
key: test_precision
value: [1. 0.96428571 0.96428571 0.93548387 0.87096774 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9595860479898299
key: train_precision
value: [1. 0.99606299 1. 1. 0.99606299 0.99603175
1. 1. 0.99606299 0.99604743]
mean value: 0.9980268153239739
key: test_recall
value: [0.96428571 0.96428571 0.93103448 1. 0.96428571 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.968103448275862
key: train_recall
value: [0.99212598 0.99606299 0.99209486 1. 0.99606299 0.98818898
0.98818898 1. 0.99606299 0.99212598]
mean value: 0.9940913759297875
key: test_roc_auc
value: [0.98214286 0.96490148 0.9476601 0.96428571 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9626847290640395
key: train_roc_auc
value: [0.99606299 0.99605521 0.99604743 1. 0.99606299 0.99212598
0.99409449 1. 0.99606299 0.99409449]
mean value: 0.9960606579315926
key: test_jcc
value: [0.96428571 0.93103448 0.9 0.93548387 0.84375 1.
0.96551724 0.9 0.92857143 0.93103448]
mean value: 0.9299677220721436
key: train_jcc
value: [0.99212598 0.99215686 0.99209486 1. 0.99215686 0.98431373
0.98818898 1. 0.99215686 0.98823529]
mean value: 0.9921429430133137
MCC on Blind test: 0.15
Accuracy on Blind test: 0.38
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.15483451 0.16543531 0.17816734 0.18977332 0.14207339 0.17039871
0.14135075 0.18826556 0.17411089 0.15683222]
mean value: 0.16612420082092286
key: score_time
value: [0.01990747 0.02101636 0.02146673 0.02004623 0.02186728 0.02010036
0.01258993 0.02006197 0.02341485 0.02047729]
mean value: 0.020094847679138182
key: test_mcc
value: [0.8953202 0.86189955 0.82512315 0.79110556 0.75047877 0.75047877
0.68250015 0.75047877 0.64951905 0.85714286]
mean value: 0.7814046839086336
key: train_mcc
value: [0.85051239 0.85019923 0.84231823 0.8428767 0.85465533 0.84293789
0.83890131 0.85513299 0.87062545 0.84677832]
mean value: 0.8494937826202889
key: test_accuracy
value: [0.94736842 0.92982456 0.9122807 0.89473684 0.875 0.875
0.83928571 0.875 0.82142857 0.92857143]
mean value: 0.8898496240601503
key: train_accuracy
value: [0.92504931 0.92504931 0.92110454 0.92110454 0.92716535 0.92125984
0.91929134 0.92716535 0.93503937 0.92322835]
mean value: 0.9245457298606905
key: test_fscore
value: [0.94736842 0.93103448 0.9122807 0.9 0.87719298 0.87272727
0.84745763 0.87272727 0.83333333 0.92857143]
mean value: 0.892269352249973
key: train_fscore
value: [0.92635659 0.92578125 0.92156863 0.92248062 0.92815534 0.92248062
0.92038835 0.92870906 0.93617021 0.92427184]
mean value: 0.925636250953157
key: test_precision
value: [0.93103448 0.9 0.92857143 0.87096774 0.86206897 0.88888889
0.80645161 0.88888889 0.78125 0.92857143]
mean value: 0.8786693438035207
key: train_precision
value: [0.91221374 0.91860465 0.91439689 0.90494297 0.91570881 0.90839695
0.90804598 0.90943396 0.92015209 0.91187739]
mean value: 0.9123773428551641
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.85714286
0.89285714 0.85714286 0.89285714 0.92857143]
mean value: 0.9077586206896552
key: train_recall
value: [0.94094488 0.93307087 0.92885375 0.94071146 0.94094488 0.93700787
0.93307087 0.9488189 0.95275591 0.93700787]
mean value: 0.9393187264635399
key: test_roc_auc
value: [0.9476601 0.93041872 0.91256158 0.89408867 0.875 0.875
0.83928571 0.875 0.82142857 0.92857143]
mean value: 0.8899014778325124
key: train_roc_auc
value: [0.9250179 0.92503346 0.92111979 0.92114313 0.92716535 0.92125984
0.91929134 0.92716535 0.93503937 0.92322835]
mean value: 0.9245463882232112
key: test_jcc
value: [0.9 0.87096774 0.83870968 0.81818182 0.78125 0.77419355
0.73529412 0.77419355 0.71428571 0.86666667]
mean value: 0.807374283291029
key: train_jcc
value: [0.86281588 0.86181818 0.85454545 0.85611511 0.86594203 0.85611511
0.85251799 0.86690647 0.88 0.85920578]
mean value: 0.8615982002257956
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.25657248 0.24637365 0.24630475 0.24570489 0.24687362 0.24694991
0.24950528 0.24740005 0.24689674 0.24667573]
mean value: 0.2479257106781006
key: score_time
value: [0.00848842 0.00830841 0.00831699 0.00836349 0.00849056 0.00834179
0.00837541 0.00853562 0.00849915 0.00830841]
mean value: 0.008402824401855469
key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82195294 1.
0.96490128 0.89342711 0.96490128 0.92857143]
mean value: 0.9330856656296584
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.9662907268170426
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.96666667 0.9122807 1.
0.98245614 0.94736842 0.98181818 0.96428571]
mean value: 0.9666496963411664
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96551724 0.93548387 0.89655172 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9622675989194343
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1.
1. 0.96428571 0.96428571 0.96428571]
mean value: 0.971551724137931
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.9661945812807883
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.93548387 0.83870968 1.
0.96551724 0.9 0.96428571 0.93103448]
mean value: 0.9363684517188411
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.3
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01195836 0.01368475 0.01422262 0.01405859 0.01395178 0.01564765
0.01407719 0.01829219 0.02588701 0.01483369]
mean value: 0.0156613826751709
key: score_time
value: [0.01075363 0.0107801 0.01071596 0.01084328 0.01077461 0.01080632
0.01084328 0.01160669 0.01160884 0.01082087]
mean value: 0.010955357551574707
key: test_mcc
value: [0.58069726 0.65466436 0.5920535 0.56277738 0.30588765 0.43876345
0.77459667 0.64116714 0.57735027 0.55339859]
mean value: 0.5681356266624504
key: train_mcc
value: [0.6451496 0.68602482 0.64393328 0.68142563 0.57742076 0.7295157
0.62763342 0.69688549 0.64324077 0.65891447]
mean value: 0.6590143937690011
key: test_accuracy
value: [0.75438596 0.8245614 0.77192982 0.77192982 0.64285714 0.71428571
0.875 0.80357143 0.75 0.75 ]
mean value: 0.7658521303258146
key: train_accuracy
value: [0.79684418 0.82840237 0.79487179 0.82248521 0.7519685 0.86220472
0.78740157 0.83070866 0.79724409 0.80708661]
mean value: 0.8079217723524205
key: test_fscore
value: [0.66666667 0.80769231 0.72340426 0.74509804 0.56521739 0.68
0.85714286 0.76595745 0.66666667 0.68181818]
mean value: 0.7159663812634374
key: train_fscore
value: [0.74816626 0.8 0.74257426 0.78773585 0.671875 0.85355649
0.73399015 0.79906542 0.74939173 0.76442308]
mean value: 0.7650778223767692
key: test_precision
value: [1. 0.875 0.94444444 0.86363636 0.72222222 0.77272727
1. 0.94736842 1. 0.9375 ]
mean value: 0.9062898724082935
key: train_precision
value: [0.98709677 0.96132597 0.99337748 0.97660819 0.99230769 0.91071429
0.98026316 0.98275862 0.98089172 0.98148148]
mean value: 0.9746825369455663
key: test_recall
value: [0.5 0.75 0.5862069 0.65517241 0.46428571 0.60714286
0.75 0.64285714 0.5 0.53571429]
mean value: 0.5991379310344828
key: train_recall
value: [0.6023622 0.68503937 0.59288538 0.66007905 0.50787402 0.80314961
0.58661417 0.67322835 0.60629921 0.62598425]
mean value: 0.6343515607979833
key: test_roc_auc
value: [0.75 0.82327586 0.77524631 0.77401478 0.64285714 0.71428571
0.875 0.80357143 0.75 0.75 ]
mean value: 0.7658251231527093
key: train_roc_auc
value: [0.79722853 0.82868569 0.79447418 0.82216551 0.7519685 0.86220472
0.78740157 0.83070866 0.79724409 0.80708661]
mean value: 0.8079168093118795
key: test_jcc
value: [0.5 0.67741935 0.56666667 0.59375 0.39393939 0.51515152
0.75 0.62068966 0.5 0.51724138]
mean value: 0.5634857965079044
key: train_jcc
value: [0.59765625 0.66666667 0.59055118 0.64980545 0.50588235 0.74452555
0.57976654 0.66536965 0.59922179 0.61867704]
mean value: 0.621812246508153
MCC on Blind test: 0.37
Accuracy on Blind test: 0.82
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02161765 0.04498935 0.02991724 0.02177143 0.01122379 0.01113892
0.01118159 0.02967238 0.01116443 0.01118255]
mean value: 0.020385932922363282
key: score_time
value: [0.01993227 0.02460504 0.02005219 0.01057243 0.0105114 0.01053619
0.01052952 0.01053238 0.010499 0.0106318 ]
mean value: 0.013840222358703613
key: test_mcc
value: [0.8953202 0.8953202 0.85960591 0.79110556 0.71611487 0.82195294
0.71611487 0.71611487 0.68250015 0.82195294]
mean value: 0.7916102525004516
key: train_mcc
value: [0.81126698 0.82324487 0.81877755 0.81895888 0.82769588 0.81142619
0.81142619 0.82718204 0.83529327 0.8154727 ]
mean value: 0.8200744570697928
key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.89473684 0.85714286 0.91071429
0.85714286 0.85714286 0.83928571 0.91071429]
mean value: 0.8951441102756892
key: train_accuracy
value: [0.90532544 0.9112426 0.90927022 0.90927022 0.91338583 0.90551181
0.90551181 0.91338583 0.91732283 0.90748031]
mean value: 0.9097706906459178
key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.9 0.86206897 0.90909091
0.86206897 0.85185185 0.84745763 0.90909091]
mean value: 0.8967400553050681
key: train_fscore
value: [0.90733591 0.9132948 0.91015625 0.91050584 0.91538462 0.90697674
0.90697674 0.91472868 0.91891892 0.90909091]
mean value: 0.9113369405536723
key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.87096774 0.83333333 0.92592593
0.83333333 0.88461538 0.80645161 0.92592593]
mean value: 0.8873656706248475
key: train_precision
value: [0.89015152 0.89433962 0.8996139 0.89655172 0.89473684 0.89312977
0.89312977 0.90076336 0.90151515 0.89353612]
mean value: 0.8957467777601632
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
0.89285714 0.82142857 0.89285714 0.89285714]
mean value: 0.9076354679802956
key: train_recall
value: [0.92519685 0.93307087 0.92094862 0.92490119 0.93700787 0.92125984
0.92125984 0.92913386 0.93700787 0.92519685]
mean value: 0.9274983660639258
key: test_roc_auc
value: [0.9476601 0.9476601 0.92980296 0.89408867 0.85714286 0.91071429
0.85714286 0.85714286 0.83928571 0.91071429]
mean value: 0.8951354679802956
key: train_roc_auc
value: [0.90528617 0.91119946 0.90929321 0.90930099 0.91338583 0.90551181
0.90551181 0.91338583 0.91732283 0.90748031]
mean value: 0.9097678254645047
key: test_jcc
value: [0.9 0.9 0.87096774 0.81818182 0.75757576 0.83333333
0.75757576 0.74193548 0.73529412 0.83333333]
mean value: 0.814819734345351
key: train_jcc
value: [0.83038869 0.84042553 0.83512545 0.83571429 0.84397163 0.82978723
0.82978723 0.84285714 0.85 0.83333333]
mean value: 0.8371390533718615
MCC on Blind test: 0.25
Accuracy on Blind test: 0.71
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.18555856 0.20449281 0.19151855 0.191679 0.19248724 0.19242501
0.20449567 0.27775383 0.19228506 0.1919651 ]
mean value: 0.20246608257293702
key: score_time
value: [0.02051473 0.01998162 0.02048826 0.02080917 0.02009439 0.01971388
0.0109446 0.02007937 0.01075292 0.01076293]
mean value: 0.017414188385009764
key: test_mcc
value: [0.85960591 0.8953202 0.85960591 0.82490815 0.75434227 0.82195294
0.71611487 0.71611487 0.68250015 0.82195294]
mean value: 0.7952418219423117
key: train_mcc
value: [0.86225372 0.8551535 0.84648438 0.83474492 0.86253233 0.8431734
0.81142619 0.85105352 0.83529327 0.8154727 ]
mean value: 0.8417587938288557
key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807 0.875 0.91071429
0.85714286 0.85714286 0.83928571 0.91071429]
mean value: 0.8969298245614035
key: train_accuracy
value: [0.93096647 0.9270217 0.92307692 0.91715976 0.93110236 0.92125984
0.90551181 0.92519685 0.91732283 0.90748031]
mean value: 0.9206098867819037
key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091
0.86206897 0.85185185 0.84745763 0.90909091]
mean value: 0.8983144764543761
key: train_fscore
value: [0.93203883 0.92898273 0.92397661 0.91828794 0.93203883 0.92277992
0.90697674 0.92664093 0.91891892 0.90909091]
mean value: 0.9219732362977793
key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9 0.83870968 0.92592593
0.83333333 0.88461538 0.80645161 0.92592593]
mean value: 0.8905602254211821
key: train_precision
value: [0.91954023 0.90636704 0.91153846 0.90421456 0.91954023 0.90530303
0.89312977 0.90909091 0.90151515 0.89353612]
mean value: 0.9063775505468512
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
0.89285714 0.82142857 0.89285714 0.89285714]
mean value: 0.9076354679802956
key: train_recall
value: [0.94488189 0.95275591 0.93675889 0.93280632 0.94488189 0.94094488
0.92125984 0.94488189 0.93700787 0.92519685]
mean value: 0.9381376241013352
key: test_roc_auc
value: [0.92980296 0.9476601 0.92980296 0.91194581 0.875 0.91071429
0.85714286 0.85714286 0.83928571 0.91071429]
mean value: 0.8969211822660099
key: train_roc_auc
value: [0.93093897 0.92697084 0.92310386 0.91719056 0.93110236 0.92125984
0.90551181 0.92519685 0.91732283 0.90748031]
mean value: 0.920607824219601
key: test_jcc
value: [0.86666667 0.9 0.87096774 0.84375 0.78787879 0.83333333
0.75757576 0.74193548 0.73529412 0.83333333]
mean value: 0.817073522224139
key: train_jcc
value: [0.87272727 0.86738351 0.85869565 0.84892086 0.87272727 0.85663082
0.82978723 0.86330935 0.85 0.83333333]
mean value: 0.8553515317749246
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03999615 0.02583027 0.02410555 0.02264023 0.02619028 0.02323699
0.02574015 0.02130461 0.0231998 0.02364349]
mean value: 0.0255887508392334
key: score_time
value: [0.01082921 0.01091146 0.01050711 0.01049757 0.01048684 0.01047969
0.01068377 0.01047754 0.01049089 0.01047397]
mean value: 0.010583806037902831
key: test_mcc
value: [0.8953202 0.8953202 0.82512315 0.82490815 0.71611487 0.89342711
0.71611487 0.75047877 0.68250015 0.85933785]
mean value: 0.8058645326851578
key: train_mcc
value: [0.83454496 0.83472439 0.83070006 0.83456039 0.85486752 0.81527029
0.83505996 0.83076661 0.8355787 0.81511857]
mean value: 0.8321191457866195
key: test_accuracy
value: [0.94736842 0.94736842 0.9122807 0.9122807 0.85714286 0.94642857
0.85714286 0.875 0.83928571 0.92857143]
mean value: 0.9022869674185463
key: train_accuracy
value: [0.91715976 0.91715976 0.91518738 0.91715976 0.92716535 0.90748031
0.91732283 0.91535433 0.91732283 0.90748031]
mean value: 0.9158792650918636
key: test_fscore
value: [0.94736842 0.94736842 0.9122807 0.91525424 0.86206897 0.94545455
0.86206897 0.87272727 0.84745763 0.92592593]
mean value: 0.9037975083408656
key: train_fscore
value: [0.91828794 0.91860465 0.91617934 0.91796875 0.92843327 0.90873786
0.91860465 0.91585127 0.91923077 0.90838207]
mean value: 0.9170280567760439
key: test_precision
value: [0.93103448 0.93103448 0.92857143 0.9 0.83333333 0.96296296
0.83333333 0.88888889 0.80645161 0.96153846]
mean value: 0.8977148987048875
key: train_precision
value: [0.90769231 0.90458015 0.90384615 0.90733591 0.91254753 0.89655172
0.90458015 0.91050584 0.89849624 0.8996139 ]
mean value: 0.9045749903664201
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.92857143
0.89285714 0.85714286 0.89285714 0.89285714]
mean value: 0.9113300492610837
key: train_recall
value: [0.92913386 0.93307087 0.92885375 0.92885375 0.94488189 0.92125984
0.93307087 0.92125984 0.94094488 0.91732283]
mean value: 0.9298652391771187
key: test_roc_auc
value: [0.9476601 0.9476601 0.91256158 0.91194581 0.85714286 0.94642857
0.85714286 0.875 0.83928571 0.92857143]
mean value: 0.9023399014778325
key: train_roc_auc
value: [0.9171361 0.91712832 0.91521428 0.91718278 0.92716535 0.90748031
0.91732283 0.91535433 0.91732283 0.90748031]
mean value: 0.9158787463819987
key: test_jcc
value: [0.9 0.9 0.83870968 0.84375 0.75757576 0.89655172
0.75757576 0.77419355 0.73529412 0.86206897]
mean value: 0.8265719548260198
key: train_jcc
value: [0.84892086 0.84946237 0.84532374 0.84837545 0.86642599 0.83274021
0.84946237 0.84476534 0.85053381 0.83214286]
mean value: 0.8468153000998123
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.74657226 0.72350025 0.66616726 0.69195914 0.85434008 0.67523313
0.67365026 0.74283338 0.70560384 0.68593574]
mean value: 0.716579532623291
key: score_time
value: [0.01196027 0.01932144 0.020437 0.01222825 0.01223254 0.01219296
0.01108098 0.01215911 0.01234174 0.01253176]
mean value: 0.013648605346679688
key: test_mcc
value: [0.93202124 0.92980296 0.92980296 0.85960591 0.78772636 1.
0.85933785 0.85714286 0.78772636 0.85714286]
mean value: 0.8800309350106305
key: train_mcc
value: [0.93691352 0.93691352 0.94480151 0.93691156 0.93703692 0.93703692
0.92913386 0.9332517 0.92520402 0.9330781 ]
mean value: 0.9350281642225636
key: test_accuracy
value: [0.96491228 0.96491228 0.96491228 0.92982456 0.89285714 1.
0.92857143 0.92857143 0.89285714 0.92857143]
mean value: 0.9395989974937343
key: train_accuracy
value: [0.96844181 0.96844181 0.97238659 0.96844181 0.96850394 0.96850394
0.96456693 0.96653543 0.96259843 0.96653543]
mean value: 0.9674956126046373
key: test_fscore
value: [0.96296296 0.96428571 0.96551724 0.93103448 0.89655172 1.
0.93103448 0.92857143 0.89655172 0.92857143]
mean value: 0.9405081189563949
key: train_fscore
value: [0.96837945 0.96837945 0.97222222 0.96825397 0.96837945 0.96837945
0.96456693 0.96620278 0.96267191 0.96646943]
mean value: 0.9673905023176848
key: test_precision
value: [1. 0.96428571 0.96551724 0.93103448 0.86666667 1.
0.9 0.92857143 0.86666667 0.92857143]
mean value: 0.9351313628899836
key: train_precision
value: [0.97222222 0.97222222 0.97609562 0.97211155 0.97222222 0.97222222
0.96456693 0.97590361 0.96078431 0.96837945]
mean value: 0.9706730364161126
key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 1.
0.96428571 0.92857143 0.92857143 0.92857143]
mean value: 0.9467980295566503
key: train_recall
value: [0.96456693 0.96456693 0.96837945 0.96442688 0.96456693 0.96456693
0.96456693 0.95669291 0.96456693 0.96456693]
mean value: 0.9641467741433506
key: test_roc_auc
value: [0.96428571 0.96490148 0.96490148 0.92980296 0.89285714 1.
0.92857143 0.92857143 0.89285714 0.92857143]
mean value: 0.9395320197044336
key: train_roc_auc
value: [0.96844947 0.96844947 0.9723787 0.96843391 0.96850394 0.96850394
0.96456693 0.96653543 0.96259843 0.96653543]
mean value: 0.9674955650306558
key: test_jcc
value: [0.92857143 0.93103448 0.93333333 0.87096774 0.8125 1.
0.87096774 0.86666667 0.8125 0.86666667]
mean value: 0.8893208061867683
key: train_jcc
value: [0.93869732 0.93869732 0.94594595 0.93846154 0.93869732 0.93869732
0.93155894 0.93461538 0.9280303 0.9351145 ]
mean value: 0.9368515883261834
MCC on Blind test: 0.23
Accuracy on Blind test: 0.65
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01068711 0.00849342 0.00775218 0.0076437 0.00741124 0.00742817
0.00740385 0.0076189 0.00729799 0.0073278 ]
mean value: 0.007906436920166016
key: score_time
value: [0.0128932 0.00827646 0.00841212 0.00812268 0.00794554 0.00796461
0.00785041 0.00781918 0.00777936 0.00783634]
mean value: 0.008489990234375
key: test_mcc
value: [0.77728159 0.68736396 0.77903565 0.56277738 0.47187011 0.58501794
0.72168784 0.65814518 0.70082556 0.65814518]
mean value: 0.6602150384851577
key: train_mcc
value: [0.66258992 0.65336491 0.67038524 0.68202471 0.62396093 0.66768511
0.66768511 0.72158618 0.67809175 0.67572951]
mean value: 0.6703103372008967
key: test_accuracy
value: [0.87719298 0.84210526 0.87719298 0.77192982 0.73214286 0.78571429
0.85714286 0.82142857 0.83928571 0.82142857]
mean value: 0.8225563909774436
key: train_accuracy
value: [0.82445759 0.81854043 0.82840237 0.83234714 0.79527559 0.82677165
0.82677165 0.85826772 0.83267717 0.83070866]
mean value: 0.8274219975461647
key: test_fscore
value: [0.85714286 0.83018868 0.8627451 0.74509804 0.70588235 0.76
0.84615385 0.8 0.81632653 0.8 ]
mean value: 0.8023537403350309
key: train_fscore
value: [0.80525164 0.79646018 0.80879121 0.81069042 0.75586854 0.80701754
0.80701754 0.84937238 0.81481481 0.81140351]
mean value: 0.8066687790927018
key: test_precision
value: [1. 0.88 1. 0.86363636 0.7826087 0.86363636
0.91666667 0.90909091 0.95238095 0.90909091]
mean value: 0.9077110860154338
key: train_precision
value: [0.90640394 0.90909091 0.91089109 0.92857143 0.93604651 0.91089109
0.91089109 0.90625 0.91219512 0.91584158]
mean value: 0.9147072763613312
key: test_recall
value: [0.75 0.78571429 0.75862069 0.65517241 0.64285714 0.67857143
0.78571429 0.71428571 0.71428571 0.71428571]
mean value: 0.7199507389162562
key: train_recall
value: [0.72440945 0.70866142 0.72727273 0.71936759 0.63385827 0.72440945
0.72440945 0.7992126 0.73622047 0.72834646]
mean value: 0.7226167875260652
key: test_roc_auc
value: [0.875 0.841133 0.87931034 0.77401478 0.73214286 0.78571429
0.85714286 0.82142857 0.83928571 0.82142857]
mean value: 0.8226600985221675
key: train_roc_auc
value: [0.82465532 0.81875759 0.82820329 0.83212474 0.79527559 0.82677165
0.82677165 0.85826772 0.83267717 0.83070866]
mean value: 0.8274213376489994
key: test_jcc
value: [0.75 0.70967742 0.75862069 0.59375 0.54545455 0.61290323
0.73333333 0.66666667 0.68965517 0.66666667]
mean value: 0.6726727719351467
key: train_jcc
value: [0.67399267 0.66176471 0.67896679 0.68164794 0.60754717 0.67647059
0.67647059 0.73818182 0.6875 0.68265683]
mean value: 0.6765199100649822
MCC on Blind test: 0.34
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00778794 0.00758052 0.00752497 0.00759459 0.00758076 0.00756574
0.00753331 0.00752449 0.00757456 0.0075686 ]
mean value: 0.00758354663848877
key: score_time
value: [0.00791955 0.00788665 0.00793982 0.00793576 0.00796843 0.00790358
0.00784445 0.00793791 0.00797367 0.00794005]
mean value: 0.007924985885620118
key: test_mcc
value: [0.8953202 0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
0.64285714 0.75047877 0.64450339 0.82195294]
mean value: 0.7625646979424463
key: train_mcc
value: [0.75941547 0.75148224 0.759525 0.75544282 0.77167747 0.77186893
0.77564465 0.77588525 0.78749923 0.76800824]
mean value: 0.7676449294755058
key: test_accuracy
value: [0.94736842 0.9122807 0.92982456 0.85964912 0.85714286 0.875
0.82142857 0.875 0.82142857 0.91071429]
mean value: 0.8809837092731829
key: train_accuracy
value: [0.87968442 0.87573964 0.87968442 0.87771203 0.88582677 0.88582677
0.88779528 0.88779528 0.89370079 0.88385827]
mean value: 0.8837623662426812
key: test_fscore
value: [0.94736842 0.9122807 0.93103448 0.86206897 0.86206897 0.87272727
0.82142857 0.87272727 0.82758621 0.90909091]
mean value: 0.8818381769470699
key: train_fscore
value: [0.88062622 0.8762279 0.88062622 0.87698413 0.88627451 0.88715953
0.88845401 0.88932039 0.89453125 0.88543689]
mean value: 0.8845641057179913
key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
0.82142857 0.88888889 0.8 0.92592593]
mean value: 0.8779155263638022
key: train_precision
value: [0.87548638 0.8745098 0.87209302 0.88047809 0.8828125 0.87692308
0.88326848 0.87739464 0.8875969 0.87356322]
mean value: 0.8784126109194028
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8864532019704433
key: train_recall
value: [0.88582677 0.87795276 0.88932806 0.87351779 0.88976378 0.8976378
0.89370079 0.9015748 0.9015748 0.8976378 ]
mean value: 0.8908515141140954
key: test_roc_auc
value: [0.9476601 0.91256158 0.92980296 0.85960591 0.85714286 0.875
0.82142857 0.875 0.82142857 0.91071429]
mean value: 0.8810344827586207
key: train_roc_auc
value: [0.87967228 0.87573527 0.8797034 0.87770378 0.88582677 0.88582677
0.88779528 0.88779528 0.89370079 0.88385827]
mean value: 0.8837617876816781
key: test_jcc
value: [0.9 0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
0.6969697 0.77419355 0.70588235 0.83333333]
mean value: 0.7909401414524755
key: train_jcc
value: [0.78671329 0.77972028 0.78671329 0.78091873 0.79577465 0.7972028
0.79929577 0.8006993 0.80918728 0.79442509]
mean value: 0.7930650467759314
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00749707 0.0071063 0.00800991 0.00797606 0.00807238 0.00814319
0.00821209 0.00825953 0.0080924 0.00823736]
mean value: 0.007960629463195801
key: score_time
value: [0.01054406 0.01405478 0.01150608 0.0120914 0.0119555 0.01721978
0.01335192 0.0119431 0.01190829 0.01170444]
mean value: 0.012627935409545899
key: test_mcc
value: [0.8953202 0.78940887 0.71921182 0.79110556 0.75047877 0.68250015
0.60753044 0.75047877 0.58501794 0.82195294]
mean value: 0.7393005465274064
key: train_mcc
value: [0.78308641 0.78304441 0.77919572 0.79093074 0.79951627 0.78742599
0.80317451 0.80759374 0.80324922 0.78395685]
mean value: 0.7921173847894009
key: test_accuracy
value: [0.94736842 0.89473684 0.85964912 0.89473684 0.875 0.83928571
0.80357143 0.875 0.78571429 0.91071429]
mean value: 0.868577694235589
key: train_accuracy
value: [0.89151874 0.89151874 0.88954635 0.89546351 0.8996063 0.89370079
0.9015748 0.90354331 0.9015748 0.89173228]
mean value: 0.8959779620742673
key: test_fscore
value: [0.94736842 0.89285714 0.86206897 0.9 0.87719298 0.83018868
0.80701754 0.87272727 0.80645161 0.90909091]
mean value: 0.8704963529709496
key: train_fscore
value: [0.89236791 0.89151874 0.89019608 0.8950495 0.90097087 0.89411765
0.90196078 0.90522244 0.90234375 0.89361702]
mean value: 0.8967364740693871
key: test_precision
value: [0.93103448 0.89285714 0.86206897 0.87096774 0.86206897 0.88
0.79310345 0.88888889 0.73529412 0.92592593]
mean value: 0.8642209679323466
key: train_precision
value: [0.88715953 0.89328063 0.88326848 0.8968254 0.88888889 0.890625
0.8984375 0.88973384 0.89534884 0.878327 ]
mean value: 0.8901895107400759
key: test_recall
value: [0.96428571 0.89285714 0.86206897 0.93103448 0.89285714 0.78571429
0.82142857 0.85714286 0.89285714 0.89285714]
mean value: 0.8793103448275862
key: train_recall
value: [0.8976378 0.88976378 0.8972332 0.89328063 0.91338583 0.8976378
0.90551181 0.92125984 0.90944882 0.90944882]
mean value: 0.9034608322181071
key: test_roc_auc
value: [0.9476601 0.89470443 0.85960591 0.89408867 0.875 0.83928571
0.80357143 0.875 0.78571429 0.91071429]
mean value: 0.8685344827586207
key: train_roc_auc
value: [0.89150664 0.89152221 0.88956148 0.89545921 0.8996063 0.89370079
0.9015748 0.90354331 0.9015748 0.89173228]
mean value: 0.8959781830630855
key: test_jcc
value: [0.9 0.80645161 0.75757576 0.81818182 0.78125 0.70967742
0.67647059 0.77419355 0.67567568 0.83333333]
mean value: 0.7732809753647041
key: train_jcc
value: [0.80565371 0.80427046 0.80212014 0.81003584 0.81978799 0.80851064
0.82142857 0.82685512 0.82206406 0.80769231]
mean value: 0.8128418840416354
MCC on Blind test: 0.25
Accuracy on Blind test: 0.72
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01613593 0.01766896 0.01700807 0.01466203 0.01586533 0.01819515
0.01554966 0.01456881 0.01771259 0.01677132]
mean value: 0.016413784027099608
key: score_time
value: [0.00918293 0.01023507 0.00923419 0.00916266 0.01018572 0.01020145
0.00912857 0.00951362 0.0102036 0.00925684]
mean value: 0.009630465507507324
key: test_mcc
value: [0.8953202 0.8953202 0.85960591 0.75462449 0.71611487 0.78772636
0.64285714 0.75047877 0.64450339 0.78772636]
mean value: 0.7734277700975402
key: train_mcc
value: [0.77528914 0.77528914 0.77932046 0.78708603 0.79537422 0.78376226
0.80337378 0.79163927 0.79926835 0.77574087]
mean value: 0.7866143511152437
key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.87719298 0.85714286 0.89285714
0.82142857 0.875 0.82142857 0.89285714]
mean value: 0.8862468671679198
key: train_accuracy
value: [0.88757396 0.88757396 0.88954635 0.89349112 0.8976378 0.89173228
0.9015748 0.89566929 0.8996063 0.88779528]
mean value: 0.8932201152370747
key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.88135593 0.86206897 0.88888889
0.82142857 0.87272727 0.82758621 0.88888889]
mean value: 0.8868716051414689
key: train_fscore
value: [0.88888889 0.88888889 0.890625 0.89411765 0.8984375 0.89320388
0.90272374 0.89708738 0.90019569 0.88888889]
mean value: 0.8943057505986216
key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.86666667 0.83333333 0.92307692
0.82142857 0.88888889 0.8 0.92307692]
mean value: 0.8849574754747168
key: train_precision
value: [0.88030888 0.88030888 0.88030888 0.88715953 0.89147287 0.88122605
0.89230769 0.88505747 0.89494163 0.88030888]
mean value: 0.8853400773979657
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.85714286]
mean value: 0.8899014778325123
key: train_recall
value: [0.8976378 0.8976378 0.90118577 0.90118577 0.90551181 0.90551181
0.91338583 0.90944882 0.90551181 0.8976378 ]
mean value: 0.9034655006068906
key: test_roc_auc
value: [0.9476601 0.9476601 0.92980296 0.87684729 0.85714286 0.89285714
0.82142857 0.875 0.82142857 0.89285714]
mean value: 0.886268472906404
key: train_roc_auc
value: [0.88755408 0.88755408 0.88956926 0.89350627 0.8976378 0.89173228
0.9015748 0.89566929 0.8996063 0.88779528]
mean value: 0.8932199433568828
key: test_jcc
value: [0.9 0.9 0.87096774 0.78787879 0.75757576 0.8
0.6969697 0.77419355 0.70588235 0.8 ]
mean value: 0.7993467885687999
key: train_jcc
value: [0.8 0.8 0.8028169 0.80851064 0.81560284 0.80701754
0.82269504 0.81338028 0.81850534 0.8 ]
mean value: 0.808852857567483
MCC on Blind test: 0.22
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.39372373 1.54294658 1.46989751 1.48421979 1.53672004 1.60921836
1.45766068 1.5458262 1.44554925 1.52575564]
mean value: 1.5011517763137818
key: score_time
value: [0.01374149 0.01342797 0.01947975 0.01363492 0.01389122 0.02115655
0.0138762 0.01382446 0.01419258 0.01371765]
mean value: 0.015094280242919922
key: test_mcc
value: [0.8951918 0.8953202 0.82490815 0.85960591 0.75047877 0.89802651
0.85933785 0.78772636 0.78772636 0.85714286]
mean value: 0.8415464773043235
key: train_mcc
value: [0.98028353 0.96055211 0.97239383 0.96055211 0.97640822 0.97244848
0.96463421 0.96850394 0.9645744 0.96853396]
mean value: 0.9688884814344612
key: test_accuracy
value: [0.94736842 0.94736842 0.9122807 0.92982456 0.875 0.94642857
0.92857143 0.89285714 0.89285714 0.92857143]
mean value: 0.9201127819548872
key: train_accuracy
value: [0.99013807 0.98027613 0.98619329 0.98027613 0.98818898 0.98622047
0.98228346 0.98425197 0.98228346 0.98425197]
mean value: 0.9844363944151951
key: test_fscore
value: [0.94545455 0.94736842 0.91525424 0.93103448 0.87719298 0.94339623
0.93103448 0.88888889 0.89655172 0.92857143]
mean value: 0.9204747419782038
key: train_fscore
value: [0.99017682 0.98031496 0.98613861 0.98023715 0.98814229 0.98619329
0.98217822 0.98425197 0.98224852 0.98418972]
mean value: 0.9844071562661963
key: test_precision
value: [0.96296296 0.93103448 0.9 0.93103448 0.86206897 1.
0.9 0.92307692 0.86666667 0.92857143]
mean value: 0.9205415912312465
key: train_precision
value: [0.98823529 0.98031496 0.98809524 0.98023715 0.99206349 0.98814229
0.98804781 0.98425197 0.98418972 0.98809524]
mean value: 0.9861673170230888
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
0.96428571 0.85714286 0.92857143 0.92857143]
mean value: 0.9219211822660098
key: train_recall
value: [0.99212598 0.98031496 0.98418972 0.98023715 0.98425197 0.98425197
0.97637795 0.98425197 0.98031496 0.98031496]
mean value: 0.9826631601879805
key: test_roc_auc
value: [0.94704433 0.9476601 0.91194581 0.92980296 0.875 0.94642857
0.92857143 0.89285714 0.89285714 0.92857143]
mean value: 0.9200738916256158
key: train_roc_auc
value: [0.99013414 0.98027606 0.98618935 0.98027606 0.98818898 0.98622047
0.98228346 0.98425197 0.98228346 0.98425197]
mean value: 0.9844355917960848
key: test_jcc
value: [0.89655172 0.9 0.84375 0.87096774 0.78125 0.89285714
0.87096774 0.8 0.8125 0.86666667]
mean value: 0.8535511017532709
key: train_jcc
value: [0.98054475 0.96138996 0.97265625 0.96124031 0.9765625 0.97276265
0.96498054 0.96899225 0.96511628 0.9688716 ]
mean value: 0.9693117081673194
MCC on Blind test: 0.26
Accuracy on Blind test: 0.66
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01417804 0.01202106 0.01139545 0.01080203 0.01007557 0.01062059
0.01066804 0.01060534 0.01126242 0.01187325]
mean value: 0.011350178718566894
key: score_time
value: [0.01092696 0.00883508 0.00887847 0.00816321 0.00810766 0.00819612
0.00795102 0.00797558 0.00864434 0.00838518]
mean value: 0.008606362342834472
key: test_mcc
value: [0.93202124 0.8951918 0.85960591 0.8953202 0.75434227 0.96490128
0.75434227 0.89342711 0.96490128 0.92857143]
mean value: 0.8842624793067261
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.94736842 0.92982456 0.94736842 0.875 0.98214286
0.875 0.94642857 0.98214286 0.96428571]
mean value: 0.9414473684210526
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96296296 0.94545455 0.93103448 0.94736842 0.88135593 0.98181818
0.88135593 0.94736842 0.98181818 0.96428571]
mean value: 0.942482277561025
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.93103448 0.96428571 0.83870968 1.
0.83870968 0.93103448 1. 0.96428571]
mean value: 0.9431022711890342
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.92857143 0.93103448 0.93103448 0.92857143 0.96428571
0.92857143 0.96428571 0.96428571 0.96428571]
mean value: 0.9433497536945813
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96428571 0.94704433 0.92980296 0.9476601 0.875 0.98214286
0.875 0.94642857 0.98214286 0.96428571]
mean value: 0.9413793103448276
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92857143 0.89655172 0.87096774 0.9 0.78787879 0.96428571
0.78787879 0.9 0.96428571 0.93103448]
mean value: 0.8931454381732469
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.36
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10496068 0.10379076 0.104743 0.10231304 0.1054554 0.10581684
0.10448885 0.10502958 0.10399008 0.10775542]
mean value: 0.10483436584472657
key: score_time
value: [0.01817036 0.01749301 0.01778865 0.01884627 0.01766968 0.01870561
0.01813245 0.01771808 0.01825023 0.01763487]
mean value: 0.018040919303894044
key: test_mcc
value: [0.8953202 0.86189955 0.85960591 0.82490815 0.75434227 0.96490128
0.82618439 0.82195294 0.68250015 0.92857143]
mean value: 0.8420186261363041
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.92982456 0.92982456 0.9122807 0.875 0.98214286
0.91071429 0.91071429 0.83928571 0.96428571]
mean value: 0.9201441102756892
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94736842 0.93103448 0.93103448 0.91525424 0.88135593 0.98245614
0.91525424 0.90909091 0.84745763 0.96428571]
mean value: 0.9224592184195679
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.93103448 0.9 0.93103448 0.9 0.83870968 0.96551724
0.87096774 0.92592593 0.80645161 0.96428571]
mean value: 0.9033926879366256
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 1.
0.96428571 0.89285714 0.89285714 0.96428571]
mean value: 0.9433497536945813
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9476601 0.93041872 0.92980296 0.91194581 0.875 0.98214286
0.91071429 0.91071429 0.83928571 0.96428571]
mean value: 0.9201970443349754
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.9 0.87096774 0.87096774 0.84375 0.78787879 0.96551724
0.84375 0.83333333 0.73529412 0.93103448]
mean value: 0.8582493446868079
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.71
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.0083456 0.00800943 0.00787878 0.00753284 0.00806522 0.00796127
0.00782919 0.00870728 0.00797272 0.00807309]
mean value: 0.008037543296813965
key: score_time
value: [0.00834203 0.00855613 0.00791216 0.00868464 0.00868344 0.00816584
0.00837827 0.00838041 0.00823331 0.00819325]
mean value: 0.008352947235107423
key: test_mcc
value: [0.8951918 0.68850906 0.79110556 0.78940887 0.57142857 0.65814518
0.4330127 0.85714286 0.78772636 0.64450339]
mean value: 0.7116174353174761
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.84210526 0.89473684 0.89473684 0.78571429 0.82142857
0.71428571 0.92857143 0.89285714 0.82142857]
mean value: 0.8543233082706767
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94545455 0.84745763 0.9 0.89655172 0.78571429 0.8
0.73333333 0.92857143 0.88888889 0.81481481]
mean value: 0.8540786648033872
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96296296 0.80645161 0.87096774 0.89655172 0.78571429 0.90909091
0.6875 0.92857143 0.92307692 0.84615385]
mean value: 0.8617041434546996
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.89285714 0.93103448 0.89655172 0.78571429 0.71428571
0.78571429 0.92857143 0.85714286 0.78571429]
mean value: 0.850615763546798
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94704433 0.8429803 0.89408867 0.89470443 0.78571429 0.82142857
0.71428571 0.92857143 0.89285714 0.82142857]
mean value: 0.8543103448275862
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89655172 0.73529412 0.81818182 0.8125 0.64705882 0.66666667
0.57894737 0.86666667 0.8 0.6875 ]
mean value: 0.7509367185250606
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.71
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.3327291 1.3053844 1.30537295 1.2954855 1.29172778 1.30082202
1.30300689 1.31275725 1.33773541 1.36065793]
mean value: 1.3145679235458374
key: score_time
value: [0.09119868 0.0915029 0.14295626 0.09044981 0.09054136 0.09039283
0.09088302 0.09034443 0.09237862 0.09947395]
mean value: 0.09701218605041503
key: test_mcc
value: [0.96547546 0.8953202 0.92980296 0.8951918 0.85933785 1.
0.92857143 0.89342711 0.93094934 0.92857143]
mean value: 0.922664756643307
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.9609962406015038
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.93103448 1.
0.96428571 0.94736842 0.96296296 0.96428571]
mean value: 0.961379368196865
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.96551724 0.93333333 0.9 1.
0.96428571 0.93103448 1. 0.96428571]
mean value: 0.9589490968801314
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.96428571 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.9645320197044335
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.9476601 0.96490148 0.94704433 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.9 0.93333333 0.90322581 0.87096774 1.
0.93103448 0.9 0.92857143 0.93103448]
mean value: 0.9262452990094814
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.19
Accuracy on Blind test: 0.48
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.90509057 0.91448712 0.93358636 0.96710658 0.93032479 0.91889691
0.90740323 0.95062542 0.90149426 0.91975093]
mean value: 0.924876618385315
key: score_time
value: [0.17131925 0.23245525 0.21618485 0.23573542 0.27564526 0.17861819
0.19408727 0.25907493 0.20869422 0.24864411]
mean value: 0.2220458745956421
key: test_mcc
value: [0.96547546 0.8953202 0.92980296 0.8951918 0.85714286 1.
0.92857143 0.89342711 0.93094934 0.92857143]
mean value: 0.9224452574728608
key: train_mcc
value: [0.94890036 0.95277969 0.94878539 0.95278262 0.95687833 0.94112724
0.94499908 0.95278544 0.94101052 0.94900279]
mean value: 0.9489051458683196
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.9609962406015038
key: train_accuracy
value: [0.97435897 0.97633136 0.97435897 0.97633136 0.97834646 0.97047244
0.97244094 0.97637795 0.97047244 0.97440945]
mean value: 0.974390035565081
key: test_fscore
value: [0.98181818 0.94736842 0.96551724 0.94915254 0.92857143 1.
0.96428571 0.94736842 0.96296296 0.96428571]
mean value: 0.9611330627781457
key: train_fscore
value: [0.97465887 0.9765625 0.97445972 0.97647059 0.9785575 0.97076023
0.97265625 0.97647059 0.97064579 0.97465887]
mean value: 0.9745900921567919
key: test_precision
value: [1. 0.93103448 0.96551724 0.93333333 0.92857143 1.
0.96428571 0.93103448 1. 0.96428571]
mean value: 0.9618062397372742
key: train_precision
value: [0.96525097 0.96899225 0.96875 0.9688716 0.96911197 0.96138996
0.96511628 0.97265625 0.96498054 0.96525097]
mean value: 0.9670370778213465
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.92857143 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.9609605911330049
key: train_recall
value: [0.98425197 0.98425197 0.98023715 0.98418972 0.98818898 0.98031496
0.98031496 0.98031496 0.97637795 0.98425197]
mean value: 0.9822694594005789
key: test_roc_auc
value: [0.98214286 0.9476601 0.96490148 0.94704433 0.92857143 1.
0.96428571 0.94642857 0.96428571 0.96428571]
mean value: 0.960960591133005
key: train_roc_auc
value: [0.97433942 0.97631571 0.97437055 0.97634683 0.97834646 0.97047244
0.97244094 0.97637795 0.97047244 0.97440945]
mean value: 0.9743892191341695
key: test_jcc
value: [0.96428571 0.9 0.93333333 0.90322581 0.86666667 1.
0.93103448 0.9 0.92857143 0.93103448]
mean value: 0.9258151914825997
key: train_jcc
value: [0.95057034 0.95419847 0.95019157 0.95402299 0.95801527 0.94318182
0.94676806 0.95402299 0.94296578 0.95057034]
mean value: 0.9504507631247383
MCC on Blind test: 0.21
Accuracy on Blind test: 0.52
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01842809 0.00753212 0.00757003 0.00761199 0.0075736 0.00752425
0.00748992 0.00754261 0.00776935 0.00763273]
mean value: 0.008667469024658203
key: score_time
value: [0.01342535 0.00787449 0.00798821 0.007864 0.00845551 0.00779438
0.00783849 0.00777411 0.00842023 0.00789094]
mean value: 0.008532571792602538
key: test_mcc
value: [0.8953202 0.82512315 0.85960591 0.71921182 0.71611487 0.75047877
0.64285714 0.75047877 0.64450339 0.82195294]
mean value: 0.7625646979424463
key: train_mcc
value: [0.75941547 0.75148224 0.759525 0.75544282 0.77167747 0.77186893
0.77564465 0.77588525 0.78749923 0.76800824]
mean value: 0.7676449294755058
key: test_accuracy
value: [0.94736842 0.9122807 0.92982456 0.85964912 0.85714286 0.875
0.82142857 0.875 0.82142857 0.91071429]
mean value: 0.8809837092731829
key: train_accuracy
value: [0.87968442 0.87573964 0.87968442 0.87771203 0.88582677 0.88582677
0.88779528 0.88779528 0.89370079 0.88385827]
mean value: 0.8837623662426812
key: test_fscore
value: [0.94736842 0.9122807 0.93103448 0.86206897 0.86206897 0.87272727
0.82142857 0.87272727 0.82758621 0.90909091]
mean value: 0.8818381769470699
key: train_fscore
value: [0.88062622 0.8762279 0.88062622 0.87698413 0.88627451 0.88715953
0.88845401 0.88932039 0.89453125 0.88543689]
mean value: 0.8845641057179913
key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.86206897 0.83333333 0.88888889
0.82142857 0.88888889 0.8 0.92592593]
mean value: 0.8779155263638022
key: train_precision
value: [0.87548638 0.8745098 0.87209302 0.88047809 0.8828125 0.87692308
0.88326848 0.87739464 0.8875969 0.87356322]
mean value: 0.8784126109194028
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.86206897 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8864532019704433
key: train_recall
value: [0.88582677 0.87795276 0.88932806 0.87351779 0.88976378 0.8976378
0.89370079 0.9015748 0.9015748 0.8976378 ]
mean value: 0.8908515141140954
key: test_roc_auc
value: [0.9476601 0.91256158 0.92980296 0.85960591 0.85714286 0.875
0.82142857 0.875 0.82142857 0.91071429]
mean value: 0.8810344827586207
key: train_roc_auc
value: [0.87967228 0.87573527 0.8797034 0.87770378 0.88582677 0.88582677
0.88779528 0.88779528 0.89370079 0.88385827]
mean value: 0.8837617876816781
key: test_jcc
value: [0.9 0.83870968 0.87096774 0.75757576 0.75757576 0.77419355
0.6969697 0.77419355 0.70588235 0.83333333]
mean value: 0.7909401414524755
key: train_jcc
value: [0.78671329 0.77972028 0.78671329 0.78091873 0.79577465 0.7972028
0.79929577 0.8006993 0.80918728 0.79442509]
mean value: 0.7930650467759314
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.06567097 0.04970622 0.05028796 0.05296206 0.05908322 0.0577023
0.05627537 0.05472136 0.06450558 0.06165719]
mean value: 0.057257223129272464
key: score_time
value: [0.00984359 0.00965667 0.00961947 0.01044655 0.01020241 0.01003504
0.01031113 0.00977564 0.01015902 0.00963831]
mean value: 0.009968781471252441
key: test_mcc
value: [0.96547546 0.8951918 0.92980296 0.8951918 0.89342711 1.
0.96490128 0.89342711 0.96490128 0.92857143]
mean value: 0.9330890233388842
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.94736842 0.96491228 0.94736842 0.94642857 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.9663533834586466
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.94545455 0.96551724 0.94915254 0.94736842 1.
0.98245614 0.94736842 0.98181818 0.96428571]
mean value: 0.9665239389584955
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96296296 0.96551724 0.93333333 0.93103448 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9653685458857872
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.92857143 0.96551724 0.96551724 0.96428571 1.
1. 0.96428571 0.96428571 0.96428571]
mean value: 0.9681034482758621
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.94704433 0.96490148 0.94704433 0.94642857 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.966256157635468
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.89655172 0.93333333 0.90322581 0.9 1.
0.96551724 0.9 0.96428571 0.93103448]
mean value: 0.9358234016632236
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.37
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01330185 0.04045773 0.04038882 0.04065561 0.04066443 0.04291534
0.04127479 0.04550576 0.04016542 0.04041195]
mean value: 0.03857417106628418
key: score_time
value: [0.01009989 0.01929498 0.01896739 0.01059246 0.01052094 0.01064014
0.02125072 0.01962495 0.01934791 0.01900911]
mean value: 0.015934848785400392
key: test_mcc
value: [0.85960591 0.8953202 0.85960591 0.82490815 0.75434227 0.82195294
0.71611487 0.71611487 0.64450339 0.85933785]
mean value: 0.7951806363828539
key: train_mcc
value: [0.87014673 0.87036164 0.85437653 0.85842397 0.8746939 0.85134433
0.83910959 0.85465533 0.87089581 0.85513299]
mean value: 0.8599140831820147
key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807 0.875 0.91071429
0.85714286 0.85714286 0.82142857 0.92857143]
mean value: 0.8969298245614035
key: train_accuracy
value: [0.93491124 0.93491124 0.9270217 0.92899408 0.93700787 0.92519685
0.91929134 0.92716535 0.93503937 0.92716535]
mean value: 0.9296704406032086
key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091
0.86206897 0.85185185 0.82758621 0.92592593]
mean value: 0.8980108361156687
key: train_fscore
value: [0.93592233 0.93617021 0.92787524 0.92996109 0.93822394 0.92692308
0.92069632 0.92815534 0.93641618 0.92870906]
mean value: 0.9309052796774193
key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9 0.83870968 0.92592593
0.83333333 0.88461538 0.8 0.96153846]
mean value: 0.893476317692113
key: train_precision
value: [0.92337165 0.92015209 0.91538462 0.91570881 0.92045455 0.90601504
0.90494297 0.91570881 0.91698113 0.90943396]
mean value: 0.914815362183764
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
0.89285714 0.82142857 0.85714286 0.89285714]
mean value: 0.904064039408867
key: train_recall
value: [0.9488189 0.95275591 0.94071146 0.94466403 0.95669291 0.9488189
0.93700787 0.94094488 0.95669291 0.9488189 ]
mean value: 0.9475926675173508
key: test_roc_auc
value: [0.92980296 0.9476601 0.92980296 0.91194581 0.875 0.91071429
0.85714286 0.85714286 0.82142857 0.92857143]
mean value: 0.8969211822660099
key: train_roc_auc
value: [0.93488376 0.93487598 0.92704864 0.92902493 0.93700787 0.92519685
0.91929134 0.92716535 0.93503937 0.92716535]
mean value: 0.9296699449130124
key: test_jcc
value: [0.86666667 0.9 0.87096774 0.84375 0.78787879 0.83333333
0.75757576 0.74193548 0.70588235 0.86206897]
mean value: 0.8170059089719415
key: train_jcc
value: [0.87956204 0.88 0.86545455 0.86909091 0.88363636 0.86379928
0.85304659 0.86594203 0.88043478 0.86690647]
mean value: 0.8707873026527986
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01948237 0.00829005 0.0076189 0.00788856 0.00796366 0.00804019
0.00793099 0.00874853 0.00803947 0.00794411]
mean value: 0.009194684028625489
key: score_time
value: [0.00859404 0.0083673 0.00849128 0.00827527 0.00781775 0.00821924
0.00822783 0.00834608 0.00824499 0.00831842]
mean value: 0.0082902193069458
key: test_mcc
value: [0.8953202 0.82512315 0.85960591 0.78940887 0.71611487 0.75047877
0.64285714 0.75047877 0.64450339 0.82195294]
mean value: 0.7695844023759438
key: train_mcc
value: [0.75941547 0.76333276 0.76341509 0.77515483 0.77955173 0.77186893
0.78749923 0.77962424 0.78742599 0.76786532]
mean value: 0.7735153587774711
key: test_accuracy
value: [0.94736842 0.9122807 0.92982456 0.89473684 0.85714286 0.875
0.82142857 0.875 0.82142857 0.91071429]
mean value: 0.8844924812030075
key: train_accuracy
value: [0.87968442 0.8816568 0.8816568 0.88757396 0.88976378 0.88582677
0.89370079 0.88976378 0.89370079 0.88385827]
mean value: 0.88671861653388
key: test_fscore
value: [0.94736842 0.9122807 0.93103448 0.89655172 0.86206897 0.87272727
0.82142857 0.87272727 0.82758621 0.90909091]
mean value: 0.8852864528091389
key: train_fscore
value: [0.88062622 0.88235294 0.88235294 0.88757396 0.89019608 0.88715953
0.89453125 0.890625 0.89411765 0.88499025]
mean value: 0.8874525831917391
key: test_precision
value: [0.93103448 0.89655172 0.93103448 0.89655172 0.83333333 0.88888889
0.82142857 0.88888889 0.8 0.92592593]
mean value: 0.8813638022258712
key: train_precision
value: [0.87548638 0.87890625 0.87548638 0.88582677 0.88671875 0.87692308
0.8875969 0.88372093 0.890625 0.87644788]
mean value: 0.8817738317127776
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8899014778325123
key: train_recall
value: [0.88582677 0.88582677 0.88932806 0.88932806 0.89370079 0.8976378
0.9015748 0.8976378 0.8976378 0.89370079]
mean value: 0.8932199433568827
key: test_roc_auc
value: [0.9476601 0.91256158 0.92980296 0.89470443 0.85714286 0.875
0.82142857 0.875 0.82142857 0.91071429]
mean value: 0.8845443349753694
key: train_roc_auc
value: [0.87967228 0.88164856 0.88167191 0.88757742 0.88976378 0.88582677
0.89370079 0.88976378 0.89370079 0.88385827]
mean value: 0.8867184339111761
key: test_jcc
value: [0.9 0.83870968 0.87096774 0.8125 0.75757576 0.77419355
0.6969697 0.77419355 0.70588235 0.83333333]
mean value: 0.7964325656948996
key: train_jcc
value: [0.78671329 0.78947368 0.78947368 0.79787234 0.80212014 0.7972028
0.80918728 0.8028169 0.80851064 0.79370629]
mean value: 0.7977077046669985
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00967669 0.01218224 0.01342869 0.01291203 0.01177907 0.01280165
0.01166844 0.01290774 0.01253223 0.01339507]
mean value: 0.012328386306762695
key: score_time
value: [0.00771666 0.00980401 0.00986791 0.01040888 0.01046586 0.01049089
0.010355 0.01109457 0.01037741 0.01041889]
mean value: 0.010100007057189941
key: test_mcc
value: [0.93202124 0.8953202 0.82942474 0.86189955 0.75047877 1.
0.79385662 0.78571429 0.75047877 0.78571429]
mean value: 0.8384908463006829
key: train_mcc
value: [0.90172947 0.91347458 0.90633247 0.84245181 0.90979438 0.87444958
0.84046723 0.88616336 0.8819171 0.87366794]
mean value: 0.8830447927155709
key: test_accuracy
value: [0.96491228 0.94736842 0.9122807 0.92982456 0.875 1.
0.89285714 0.89285714 0.875 0.89285714]
mean value: 0.9182957393483709
key: train_accuracy
value: [0.95069034 0.9566075 0.95266272 0.92110454 0.95472441 0.93700787
0.91929134 0.94291339 0.94094488 0.93503937]
mean value: 0.9410986348599916
key: test_fscore
value: [0.96296296 0.94736842 0.90909091 0.92857143 0.87272727 1.
0.9 0.89285714 0.87719298 0.89285714]
mean value: 0.9183628262575632
key: train_fscore
value: [0.9500998 0.9561753 0.951417 0.921875 0.95409182 0.936
0.92190476 0.94368932 0.94117647 0.93785311]
mean value: 0.941428257984581
key: test_precision
value: [1. 0.93103448 0.96153846 0.96296296 0.88888889 1.
0.84375 0.89285714 0.86206897 0.89285714]
mean value: 0.9235958047380461
key: train_precision
value: [0.96356275 0.96774194 0.97510373 0.91119691 0.96761134 0.95121951
0.89298893 0.93103448 0.9375 0.89891697]
mean value: 0.9396876562541508
key: test_recall
value: [0.92857143 0.96428571 0.86206897 0.89655172 0.85714286 1.
0.96428571 0.89285714 0.89285714 0.89285714]
mean value: 0.9151477832512316
key: train_recall
value: [0.93700787 0.94488189 0.92885375 0.93280632 0.94094488 0.92125984
0.95275591 0.95669291 0.94488189 0.98031496]
mean value: 0.9440400236531699
key: test_roc_auc
value: [0.96428571 0.9476601 0.91317734 0.93041872 0.875 1.
0.89285714 0.89285714 0.875 0.89285714]
mean value: 0.9184113300492611
key: train_roc_auc
value: [0.95071738 0.95663067 0.95261585 0.92112757 0.95472441 0.93700787
0.91929134 0.94291339 0.94094488 0.93503937]
mean value: 0.9411012729140082
key: test_jcc
value: [0.92857143 0.9 0.83333333 0.86666667 0.77419355 1.
0.81818182 0.80645161 0.78125 0.80645161]
mean value: 0.8515100020946795
key: train_jcc
value: [0.90494297 0.91603053 0.90733591 0.85507246 0.91221374 0.87969925
0.85512367 0.89338235 0.88888889 0.88297872]
mean value: 0.8895668499958933
MCC on Blind test: 0.19
Accuracy on Blind test: 0.58
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01474547 0.01279259 0.01564717 0.01446366 0.01306367 0.01507711
0.01365781 0.01378751 0.01366973 0.01418114]
mean value: 0.014108586311340331
key: score_time
value: [0.01045132 0.01075387 0.0109086 0.01076341 0.01093888 0.01090598
0.01117682 0.01138139 0.01105213 0.01108289]
mean value: 0.010941529273986816
key: test_mcc
value: [0.8951918 0.92980296 0.8951918 0.8953202 0.64951905 0.8660254
0.70082556 0.82195294 0.89342711 0.92857143]
mean value: 0.8475828256723696
key: train_mcc
value: [0.91324443 0.8974355 0.9215681 0.93352251 0.878014 0.86150531
0.84768598 0.89200643 0.92554839 0.91732994]
mean value: 0.8987860594733551
key: test_accuracy
value: [0.94736842 0.96491228 0.94736842 0.94736842 0.82142857 0.92857143
0.83928571 0.91071429 0.94642857 0.96428571]
mean value: 0.9217731829573934
key: train_accuracy
value: [0.9566075 0.94871795 0.96055227 0.96646943 0.93897638 0.92913386
0.92125984 0.94488189 0.96259843 0.95866142]
mean value: 0.948785895106307
key: test_fscore
value: [0.94545455 0.96428571 0.94915254 0.94736842 0.83333333 0.93333333
0.85714286 0.9122807 0.94545455 0.96428571]
mean value: 0.9252091708469943
key: train_fscore
value: [0.95652174 0.9488189 0.96108949 0.96579477 0.93933464 0.93207547
0.92537313 0.94676806 0.96207585 0.95857988]
mean value: 0.9496431934331271
key: test_precision
value: [0.96296296 0.96428571 0.93333333 0.96428571 0.78125 0.875
0.77142857 0.89655172 0.96296296 0.96428571]
mean value: 0.9076346697682904
key: train_precision
value: [0.96031746 0.9488189 0.94636015 0.98360656 0.93385214 0.89492754
0.87943262 0.91544118 0.9757085 0.96047431]
mean value: 0.9398939355807465
key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.89285714 1.
0.96428571 0.92857143 0.92857143 0.96428571]
mean value: 0.9467980295566503
key: train_recall
value: [0.95275591 0.9488189 0.97628458 0.9486166 0.94488189 0.97244094
0.97637795 0.98031496 0.9488189 0.95669291]
mean value: 0.9606003547975476
key: test_roc_auc
value: [0.94704433 0.96490148 0.94704433 0.9476601 0.82142857 0.92857143
0.83928571 0.91071429 0.94642857 0.96428571]
mean value: 0.9217364532019705
key: train_roc_auc
value: [0.95661511 0.94871775 0.96058324 0.96643428 0.93897638 0.92913386
0.92125984 0.94488189 0.96259843 0.95866142]
mean value: 0.9487862189163113
key: test_jcc
value: [0.89655172 0.93103448 0.90322581 0.9 0.71428571 0.875
0.75 0.83870968 0.89655172 0.93103448]
mean value: 0.8636393611949785
key: train_jcc
value: [0.91666667 0.90262172 0.92509363 0.93385214 0.88560886 0.87279152
0.86111111 0.89891697 0.92692308 0.92045455]
mean value: 0.904404023907068
MCC on Blind test: 0.32
Accuracy on Blind test: 0.78
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.1140337 0.10048437 0.10199213 0.10053515 0.10115218 0.10382915
0.10217881 0.09598231 0.0961473 0.09758639]
mean value: 0.10139214992523193
key: score_time
value: [0.01537442 0.01496673 0.01581311 0.01557422 0.01550794 0.01505017
0.01430726 0.01472378 0.01485848 0.01430631]
mean value: 0.01504824161529541
key: test_mcc
value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82618439 1.
0.96490128 0.93094934 0.92857143 0.85933785]
mean value: 0.9267046893623845
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1.
0.98214286 0.96428571 0.96428571 0.92857143]
mean value: 0.9627192982456141
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.96551724 0.96666667 0.91525424 1.
0.98245614 0.96551724 0.96428571 0.93103448]
mean value: 0.9636835620212532
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96551724 0.93548387 0.87096774 1.
0.96551724 0.93333333 0.96428571 0.9 ]
mean value: 0.9499390857566609
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 1. 0.96428571 1.
1. 1. 0.96428571 0.96428571]
mean value: 0.9786945812807882
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1.
0.98214286 0.96428571 0.96428571 0.92857143]
mean value: 0.9626231527093597
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.93333333 0.93548387 0.84375 1.
0.96551724 0.93333333 0.93103448 0.87096774]
mean value: 0.9308740200752158
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.39
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03926826 0.03829074 0.04713559 0.05223989 0.04820871 0.04241133
0.03772259 0.03749561 0.03793573 0.04857802]
mean value: 0.042928647994995114
key: score_time
value: [0.0239563 0.02649641 0.02141261 0.03752351 0.02890968 0.03851295
0.02284622 0.0233736 0.02389741 0.0230751 ]
mean value: 0.02700037956237793
key: test_mcc
value: [0.96547546 0.92980296 0.8953202 0.93202124 0.82618439 1.
0.96490128 0.89342711 0.93094934 0.92857143]
mean value: 0.9266653398520664
key: train_mcc
value: [0.99214142 0.99211042 0.99214118 1. 0.98819663 0.98428248
0.98825791 1. 0.99212598 0.98819663]
mean value: 0.991745267193298
key: test_accuracy
value: [0.98245614 0.96491228 0.94736842 0.96491228 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9627506265664161
key: train_accuracy
value: [0.99605523 0.99605523 0.99605523 1. 0.99409449 0.99212598
0.99409449 1. 0.99606299 0.99409449]
mean value: 0.9958638121418255
key: test_fscore
value: [0.98181818 0.96428571 0.94736842 0.96666667 0.91525424 1.
0.98245614 0.94736842 0.96296296 0.96428571]
mean value: 0.9632466459763516
key: train_fscore
value: [0.99604743 0.99606299 0.99603175 1. 0.99408284 0.99209486
0.99405941 1. 0.99606299 0.99408284]
mean value: 0.99585251091878
key: test_precision
value: [1. 0.96428571 0.96428571 0.93548387 0.87096774 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9595860479898299
key: train_precision
value: [1. 0.99606299 1. 1. 0.99604743 0.99603175
1. 1. 0.99606299 0.99604743]
mean value: 0.9980252591943793
key: test_recall
value: [0.96428571 0.96428571 0.93103448 1. 0.96428571 1.
1. 0.96428571 0.92857143 0.96428571]
mean value: 0.968103448275862
key: train_recall
value: [0.99212598 0.99606299 0.99209486 1. 0.99212598 0.98818898
0.98818898 1. 0.99606299 0.99212598]
mean value: 0.9936976751423858
key: test_roc_auc
value: [0.98214286 0.96490148 0.9476601 0.96428571 0.91071429 1.
0.98214286 0.94642857 0.96428571 0.96428571]
mean value: 0.9626847290640395
key: train_roc_auc
value: [0.99606299 0.99605521 0.99604743 1. 0.99409449 0.99212598
0.99409449 1. 0.99606299 0.99409449]
mean value: 0.9958638075378917
key: test_jcc
value: [0.96428571 0.93103448 0.9 0.93548387 0.84375 1.
0.96551724 0.9 0.92857143 0.93103448]
mean value: 0.9299677220721436
key: train_jcc
value: [0.99212598 0.99215686 0.99209486 1. 0.98823529 0.98431373
0.98818898 1. 0.99215686 0.98823529]
mean value: 0.9917507861505687
MCC on Blind test: 0.14
Accuracy on Blind test: 0.37
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.16959524 0.2451508 0.17405295 0.16930294 0.16698885 0.11223626
0.10615253 0.17977715 0.10680389 0.14715362]
mean value: 0.1577214241027832
key: score_time
value: [0.0270555 0.02043033 0.02078581 0.0202899 0.02019739 0.01269197
0.01305366 0.02091789 0.01312304 0.02039123]
mean value: 0.018893671035766602
key: test_mcc
value: [0.8953202 0.86189955 0.82512315 0.82490815 0.75047877 0.78571429
0.64450339 0.75047877 0.64951905 0.85714286]
mean value: 0.7845088175007775
key: train_mcc
value: [0.85051239 0.85019923 0.84231823 0.8428767 0.85465533 0.84293789
0.84677832 0.85513299 0.87062545 0.84677832]
mean value: 0.8502814833818734
key: test_accuracy
value: [0.94736842 0.92982456 0.9122807 0.9122807 0.875 0.89285714
0.82142857 0.875 0.82142857 0.92857143]
mean value: 0.8916040100250626
key: train_accuracy
value: [0.92504931 0.92504931 0.92110454 0.92110454 0.92716535 0.92125984
0.92322835 0.92716535 0.93503937 0.92322835]
mean value: 0.924939430648092
key: test_fscore
value: [0.94736842 0.93103448 0.9122807 0.91525424 0.87719298 0.89285714
0.82758621 0.87272727 0.83333333 0.92857143]
mean value: 0.8938206209695644
key: train_fscore
value: [0.92635659 0.92578125 0.92156863 0.92248062 0.92815534 0.92248062
0.92427184 0.92870906 0.93617021 0.92427184]
mean value: 0.9260246004677202
key: test_precision
value: [0.93103448 0.9 0.92857143 0.9 0.86206897 0.89285714
0.8 0.88888889 0.78125 0.92857143]
mean value: 0.8813242337164751
key: train_precision
value: [0.91221374 0.91860465 0.91439689 0.90494297 0.91570881 0.90839695
0.91187739 0.90943396 0.92015209 0.91187739]
mean value: 0.9127604846176163
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.89285714
0.85714286 0.85714286 0.89285714 0.92857143]
mean value: 0.9077586206896552
key: train_recall
value: [0.94094488 0.93307087 0.92885375 0.94071146 0.94094488 0.93700787
0.93700787 0.9488189 0.95275591 0.93700787]
mean value: 0.9397124272509414
key: test_roc_auc
value: [0.9476601 0.93041872 0.91256158 0.91194581 0.875 0.89285714
0.82142857 0.875 0.82142857 0.92857143]
mean value: 0.8916871921182267
key: train_roc_auc
value: [0.9250179 0.92503346 0.92111979 0.92114313 0.92716535 0.92125984
0.92322835 0.92716535 0.93503937 0.92322835]
mean value: 0.9249400890106128
key: test_jcc
value: [0.9 0.87096774 0.83870968 0.84375 0.78125 0.80645161
0.70588235 0.77419355 0.71428571 0.86666667]
mean value: 0.8102157314538718
key: train_jcc
value: [0.86281588 0.86181818 0.85454545 0.85611511 0.86594203 0.85611511
0.85920578 0.86690647 0.88 0.85920578]
mean value: 0.862266979281973
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.26614237 0.2468183 0.2485559 0.24616241 0.24786353 0.24828482
0.25354338 0.26733065 0.25087976 0.24782729]
mean value: 0.25234084129333495
key: score_time
value: [0.00884628 0.00858235 0.00861478 0.00854993 0.0087781 0.00885868
0.00969672 0.00946689 0.0085597 0.00855279]
mean value: 0.008850622177124023
key: test_mcc
value: [0.96547546 0.92980296 0.96547546 0.93202124 0.82195294 1.
0.96490128 0.89342711 0.96490128 0.92857143]
mean value: 0.9366529157151744
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.98245614 0.96491228 0.98245614 0.96491228 0.91071429 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.9680451127819548
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.98181818 0.96428571 0.98305085 0.96666667 0.9122807 1.
0.98245614 0.94736842 0.98181818 0.96428571]
mean value: 0.9684030569489981
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96666667 0.93548387 0.89655172 1.
0.96551724 0.93103448 1. 0.96428571]
mean value: 0.9623825414481699
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 1. 1. 0.92857143 1.
1. 0.96428571 0.96428571 0.96428571]
mean value: 0.975
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.98214286 0.96490148 0.98214286 0.96428571 0.91071429 1.
0.98214286 0.94642857 0.98214286 0.96428571]
mean value: 0.9679187192118227
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.96428571 0.93103448 0.96666667 0.93548387 0.83870968 1.
0.96551724 0.9 0.96428571 0.93103448]
mean value: 0.9397017850521744
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.3
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01340175 0.01410508 0.01441169 0.01409388 0.02937245 0.01492167
0.0153625 0.01417089 0.01423931 0.01514316]
mean value: 0.015922236442565917
key: score_time
value: [0.01146483 0.0109961 0.01089525 0.01093698 0.01172209 0.01099157
0.01097131 0.01087904 0.01158166 0.01099563]
mean value: 0.01114344596862793
key: test_mcc
value: [0.76550573 0.75462449 0.79161589 0.68850906 0.50518149 0.47187011
0.68250015 0.67900461 0.79385662 0.67900461]
mean value: 0.6811672742900674
key: train_mcc
value: [0.79484005 0.76863111 0.78816439 0.79111205 0.71433965 0.76123378
0.77349899 0.80474782 0.76277007 0.76987347]
mean value: 0.7729211371212047
key: test_accuracy
value: [0.87719298 0.87719298 0.89473684 0.84210526 0.75 0.73214286
0.83928571 0.83928571 0.89285714 0.83928571]
mean value: 0.8384085213032582
key: train_accuracy
value: [0.89546351 0.88362919 0.89151874 0.89349112 0.84251969 0.87795276
0.88385827 0.9015748 0.87795276 0.88385827]
mean value: 0.8831819099535635
key: test_fscore
value: [0.8627451 0.87272727 0.89285714 0.83636364 0.73076923 0.75409836
0.83018868 0.83636364 0.88461538 0.83636364]
mean value: 0.8337092078000177
key: train_fscore
value: [0.89026915 0.88032454 0.88469602 0.8875 0.81651376 0.88475836
0.87631027 0.89837398 0.86919831 0.8793456 ]
mean value: 0.8767290009085706
key: test_precision
value: [0.95652174 0.88888889 0.92592593 0.88461538 0.79166667 0.6969697
0.88 0.85185185 0.95833333 0.85185185]
mean value: 0.8686625339234035
key: train_precision
value: [0.93886463 0.90794979 0.94196429 0.93832599 0.97802198 0.83802817
0.93721973 0.92857143 0.93636364 0.91489362]
mean value: 0.9260203256453761
key: test_recall
value: [0.78571429 0.85714286 0.86206897 0.79310345 0.67857143 0.82142857
0.78571429 0.82142857 0.82142857 0.82142857]
mean value: 0.8048029556650246
key: train_recall
value: [0.84645669 0.85433071 0.83399209 0.84189723 0.7007874 0.93700787
0.82283465 0.87007874 0.81102362 0.84645669]
mean value: 0.8364865706015997
key: test_roc_auc
value: [0.87561576 0.87684729 0.8953202 0.8429803 0.75 0.73214286
0.83928571 0.83928571 0.89285714 0.83928571]
mean value: 0.8383620689655172
key: train_roc_auc
value: [0.89556036 0.88368709 0.8914055 0.89338956 0.84251969 0.87795276
0.88385827 0.9015748 0.87795276 0.88385827]
mean value: 0.8831759048893593
key: test_jcc
value: [0.75862069 0.77419355 0.80645161 0.71875 0.57575758 0.60526316
0.70967742 0.71875 0.79310345 0.71875 ]
mean value: 0.7179317452228509
key: train_jcc
value: [0.80223881 0.78623188 0.79323308 0.79775281 0.68992248 0.79333333
0.77985075 0.81549815 0.76865672 0.78467153]
mean value: 0.7811389546191971
MCC on Blind test: 0.3
Accuracy on Blind test: 0.71
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01172638 0.01136661 0.01153398 0.02457762 0.01147556 0.01133466
0.0113709 0.02615333 0.0302875 0.0303762 ]
mean value: 0.018020272254943848
key: score_time
value: [0.01069403 0.01067114 0.01076221 0.01973557 0.01065874 0.01060033
0.01062369 0.01272726 0.01075745 0.01385522]
mean value: 0.012108564376831055
key: test_mcc
value: [0.8953202 0.8953202 0.85960591 0.79110556 0.71611487 0.82195294
0.67900461 0.71611487 0.68250015 0.82195294]
mean value: 0.7878992256362354
key: train_mcc
value: [0.83472439 0.83904026 0.81877755 0.82280791 0.83123063 0.8154727
0.81142619 0.82718204 0.8431734 0.81527029]
mean value: 0.8259105358013283
key: test_accuracy
value: [0.94736842 0.94736842 0.92982456 0.89473684 0.85714286 0.91071429
0.83928571 0.85714286 0.83928571 0.91071429]
mean value: 0.8933583959899749
key: train_accuracy
value: [0.91715976 0.91913215 0.90927022 0.9112426 0.91535433 0.90748031
0.90551181 0.91338583 0.92125984 0.90748031]
mean value: 0.9127277174672692
key: test_fscore
value: [0.94736842 0.94736842 0.93103448 0.9 0.86206897 0.90909091
0.84210526 0.85185185 0.84745763 0.90909091]
mean value: 0.8947436850691334
key: train_fscore
value: [0.91860465 0.92100193 0.91015625 0.9122807 0.91682785 0.90909091
0.90697674 0.91472868 0.92277992 0.90873786]
mean value: 0.9141185505002607
key: test_precision
value: [0.93103448 0.93103448 0.93103448 0.87096774 0.83333333 0.92592593
0.82758621 0.88461538 0.80645161 0.92592593]
mean value: 0.8867909579811694
key: train_precision
value: [0.90458015 0.90188679 0.8996139 0.9 0.90114068 0.89353612
0.89312977 0.90076336 0.90530303 0.89655172]
mean value: 0.899650553503409
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
0.85714286 0.82142857 0.89285714 0.89285714]
mean value: 0.904064039408867
key: train_recall
value: [0.93307087 0.94094488 0.92094862 0.92490119 0.93307087 0.92519685
0.92125984 0.92913386 0.94094488 0.92125984]
mean value: 0.9290731692135321
key: test_roc_auc
value: [0.9476601 0.9476601 0.92980296 0.89408867 0.85714286 0.91071429
0.83928571 0.85714286 0.83928571 0.91071429]
mean value: 0.8933497536945814
key: train_roc_auc
value: [0.91712832 0.91908904 0.90929321 0.91126949 0.91535433 0.90748031
0.90551181 0.91338583 0.92125984 0.90748031]
mean value: 0.9127252497588
key: test_jcc
value: [0.9 0.9 0.87096774 0.81818182 0.75757576 0.83333333
0.72727273 0.74193548 0.73529412 0.83333333]
mean value: 0.811789431315048
key: train_jcc
value: [0.84946237 0.85357143 0.83512545 0.83870968 0.84642857 0.83333333
0.82978723 0.84285714 0.85663082 0.83274021]
mean value: 0.8418646239168347
MCC on Blind test: 0.25
Accuracy on Blind test: 0.71
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.11566854 0.13413811 0.27206016 0.20076489 0.19601989 0.19559526
0.2167592 0.19660378 0.19614172 0.1994133 ]
mean value: 0.19231648445129396
key: score_time
value: [0.01090479 0.02036548 0.0200057 0.02095532 0.0202384 0.01987505
0.0205245 0.01911926 0.01085591 0.01083922]
mean value: 0.017368364334106445
key: test_mcc
value: [0.85960591 0.8953202 0.85960591 0.82490815 0.75434227 0.82195294
0.71611487 0.71611487 0.68250015 0.85933785]
mean value: 0.7989803124894794
key: train_mcc
value: [0.86225372 0.86654135 0.85053095 0.85053095 0.87062545 0.85513299
0.83505996 0.85465533 0.86710997 0.8431734 ]
mean value: 0.8555614064171675
key: test_accuracy
value: [0.92982456 0.94736842 0.92982456 0.9122807 0.875 0.91071429
0.85714286 0.85714286 0.83928571 0.92857143]
mean value: 0.8987155388471177
key: train_accuracy
value: [0.93096647 0.93293886 0.92504931 0.92504931 0.93503937 0.92716535
0.91732283 0.92716535 0.93307087 0.92125984]
mean value: 0.927502756682042
key: test_fscore
value: [0.92857143 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091
0.86206897 0.85185185 0.84745763 0.92592593]
mean value: 0.8999979781378779
key: train_fscore
value: [0.93203883 0.93436293 0.92607004 0.92607004 0.93617021 0.92870906
0.91860465 0.92815534 0.93461538 0.92277992]
mean value: 0.9287576414141969
key: test_precision
value: [0.92857143 0.93103448 0.93103448 0.9 0.83870968 0.92592593
0.83333333 0.88461538 0.80645161 0.96153846]
mean value: 0.8941214789824357
key: train_precision
value: [0.91954023 0.91666667 0.91187739 0.91187739 0.92015209 0.90943396
0.90458015 0.91570881 0.91353383 0.90530303]
mean value: 0.9128673569164447
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
0.89285714 0.82142857 0.89285714 0.89285714]
mean value: 0.9076354679802956
key: train_recall
value: [0.94488189 0.95275591 0.94071146 0.94071146 0.95275591 0.9488189
0.93307087 0.94094488 0.95669291 0.94094488]
mean value: 0.9452289066633469
key: test_roc_auc
value: [0.92980296 0.9476601 0.92980296 0.91194581 0.875 0.91071429
0.85714286 0.85714286 0.83928571 0.92857143]
mean value: 0.8987068965517242
key: train_roc_auc
value: [0.93093897 0.93289969 0.92508014 0.92508014 0.93503937 0.92716535
0.91732283 0.92716535 0.93307087 0.92125984]
mean value: 0.927502256387912
key: test_jcc
value: [0.86666667 0.9 0.87096774 0.84375 0.78787879 0.83333333
0.75757576 0.74193548 0.73529412 0.86206897]
mean value: 0.8199470854425297
key: train_jcc
value: [0.87272727 0.87681159 0.86231884 0.86231884 0.88 0.86690647
0.84946237 0.86594203 0.87725632 0.85663082]
mean value: 0.8670374559548931
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.02059269 0.0403645 0.0256114 0.04780722 0.05848503 0.02307177
0.02302694 0.02297401 0.02418137 0.02429724]
mean value: 0.031041216850280762
key: score_time
value: [0.0107677 0.01078558 0.01102161 0.01087213 0.01087928 0.01069665
0.01075411 0.01068592 0.0107131 0.01078057]
mean value: 0.010795664787292481
key: test_mcc
value: [0.63745526 0.78410665 0.60000053 0.89139151 0.78410665 0.89139151
0.89153439 0.86334835 0.89139151 0.81854376]
mean value: 0.805327011589605
key: train_mcc
value: [0.83096715 0.83450632 0.8435716 0.82679606 0.83450632 0.8224719
0.83074746 0.83041633 0.82643766 0.82660248]
mean value: 0.830702327712044
key: test_accuracy
value: [0.81818182 0.89090909 0.8 0.94545455 0.89090909 0.94545455
0.94545455 0.92727273 0.94545455 0.90909091]
mean value: 0.9018181818181819
key: train_accuracy
value: [0.91515152 0.91717172 0.92121212 0.91313131 0.91717172 0.91111111
0.91515152 0.91515152 0.91313131 0.91313131]
mean value: 0.9151515151515152
key: test_fscore
value: [0.80769231 0.89285714 0.79245283 0.94339623 0.89285714 0.94736842
0.94545455 0.93333333 0.94736842 0.9122807 ]
mean value: 0.9015061072657895
key: train_fscore
value: [0.91699605 0.91816367 0.92337917 0.91485149 0.91816367 0.912
0.91633466 0.91566265 0.91382766 0.91417166]
mean value: 0.9163550676695618
key: test_precision
value: [0.84 0.86206897 0.80769231 0.96153846 0.86206897 0.93103448
0.96296296 0.875 0.93103448 0.89655172]
mean value: 0.8929952352883387
key: train_precision
value: [0.89922481 0.90909091 0.90038314 0.89883268 0.90909091 0.90118577
0.90196078 0.90836653 0.9047619 0.9015748 ]
mean value: 0.903447224781149
key: test_recall
value: [0.77777778 0.92592593 0.77777778 0.92592593 0.92592593 0.96428571
0.92857143 1. 0.96428571 0.92857143]
mean value: 0.9119047619047619
key: train_recall
value: [0.93548387 0.92741935 0.94758065 0.93145161 0.92741935 0.92307692
0.93117409 0.92307692 0.92307692 0.92712551]
mean value: 0.9296885203082147
key: test_roc_auc
value: [0.81746032 0.89153439 0.79960317 0.94510582 0.89153439 0.94510582
0.9457672 0.92592593 0.94510582 0.90873016]
mean value: 0.9015873015873016
key: train_roc_auc
value: [0.91511036 0.91715097 0.92115874 0.91309423 0.91715097 0.91113524
0.91518382 0.91516749 0.91315136 0.91315953]
mean value: 0.9151462713856602
key: test_jcc
value: [0.67741935 0.80645161 0.65625 0.89285714 0.80645161 0.9
0.89655172 0.875 0.9 0.83870968]
mean value: 0.8249691125059591
key: train_jcc
value: [0.84671533 0.84870849 0.85766423 0.84306569 0.84870849 0.83823529
0.84558824 0.84444444 0.84132841 0.84191176]
mean value: 0.8456370381490419
MCC on Blind test: 0.28
Accuracy on Blind test: 0.7
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.64936924 0.66852713 0.79406118 0.67012405 0.69154596 0.8323493
0.78064322 0.67432141 0.84689832 0.68367529]
mean value: 0.7291515111923218
key: score_time
value: [0.01181173 0.01206112 0.01192927 0.01092577 0.01228356 0.01233196
0.01223803 0.01151776 0.0125103 0.01225424]
mean value: 0.011986374855041504
key: test_mcc
value: [0.82269299 0.92980214 0.63745526 0.85449735 0.89642146 0.96423926
0.92980214 0.81878307 0.96423926 0.8565805 ]
mean value: 0.8674513428146936
key: train_mcc
value: [0.92727243 0.93132101 0.94355919 0.93535327 0.92730389 0.93131989
0.92324017 0.94355551 0.94346399 0.93538276]
mean value: 0.9341772124772538
key: test_accuracy
value: [0.90909091 0.96363636 0.81818182 0.92727273 0.94545455 0.98181818
0.96363636 0.90909091 0.98181818 0.92727273]
mean value: 0.9327272727272727
key: train_accuracy
value: [0.96363636 0.96565657 0.97171717 0.96767677 0.96363636 0.96565657
0.96161616 0.97171717 0.97171717 0.96767677]
mean value: 0.9670707070707071
key: test_fscore
value: [0.90196078 0.96428571 0.80769231 0.92592593 0.94736842 0.98245614
0.96296296 0.90909091 0.98245614 0.93103448]
mean value: 0.9315233788784553
key: train_fscore
value: [0.96370968 0.96565657 0.97154472 0.96774194 0.96356275 0.96551724
0.96161616 0.97142857 0.97154472 0.96747967]
mean value: 0.9669802011711329
key: test_precision
value: [0.95833333 0.93103448 0.84 0.92592593 0.9 0.96551724
1. 0.92592593 0.96551724 0.9 ]
mean value: 0.9312254150702427
key: train_precision
value: [0.96370968 0.96761134 0.9795082 0.96774194 0.96747967 0.96747967
0.95967742 0.97942387 0.9755102 0.97142857]
mean value: 0.9699570558428222
key: test_recall
value: [0.85185185 1. 0.77777778 0.92592593 1. 1.
0.92857143 0.89285714 1. 0.96428571]
mean value: 0.9341269841269841
key: train_recall
value: [0.96370968 0.96370968 0.96370968 0.96774194 0.95967742 0.96356275
0.96356275 0.96356275 0.96761134 0.96356275]
mean value: 0.9640410735274912
key: test_roc_auc
value: [0.90806878 0.96428571 0.81746032 0.92724868 0.94642857 0.98148148
0.96428571 0.90939153 0.98148148 0.9265873 ]
mean value: 0.9326719576719578
key: train_roc_auc
value: [0.96363622 0.96566051 0.97173338 0.96767664 0.96364438 0.96565234
0.96162009 0.97170073 0.97170889 0.96766847]
mean value: 0.9670701645553089
key: test_jcc
value: [0.82142857 0.93103448 0.67741935 0.86206897 0.9 0.96551724
0.92857143 0.83333333 0.96551724 0.87096774]
mean value: 0.8755858361142009
key: train_jcc
value: [0.92996109 0.93359375 0.94466403 0.9375 0.9296875 0.93333333
0.92607004 0.94444444 0.94466403 0.93700787]
mean value: 0.9360926093439301
MCC on Blind test: 0.23
Accuracy on Blind test: 0.65
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01078415 0.01023746 0.00817394 0.00776267 0.00750327 0.00742102
0.00754666 0.00760245 0.00785923 0.00745296]
mean value: 0.008234381675720215
key: score_time
value: [0.01067472 0.00926948 0.00836229 0.00813413 0.00799251 0.00794363
0.00806427 0.0079906 0.00824738 0.00798607]
mean value: 0.008466506004333496
key: test_mcc
value: [0.71588202 0.56841568 0.69419497 0.72546624 0.52715278 0.48393864
0.61131498 0.79069197 0.75878131 0.53758181]
mean value: 0.6413420391304201
key: train_mcc
value: [0.72945173 0.66755872 0.72778077 0.66639453 0.67908612 0.67326481
0.68618843 0.67555218 0.64604502 0.69582615]
mean value: 0.6847148445519045
key: test_accuracy
value: [0.85454545 0.78181818 0.83636364 0.85454545 0.76363636 0.72727273
0.8 0.89090909 0.87272727 0.76363636]
mean value: 0.8145454545454546
key: train_accuracy
value: [0.86464646 0.82626263 0.86060606 0.82626263 0.82828283 0.83030303
0.83636364 0.83030303 0.81616162 0.84242424]
mean value: 0.8361616161616161
key: test_fscore
value: [0.84 0.76 0.80851064 0.83333333 0.75471698 0.68085106
0.78431373 0.88461538 0.8627451 0.74509804]
mean value: 0.7954184263953551
key: train_fscore
value: [0.86354379 0.80630631 0.85097192 0.80717489 0.80369515 0.81165919
0.81797753 0.80995475 0.79458239 0.82666667]
mean value: 0.8192532586237161
key: test_precision
value: [0.91304348 0.82608696 0.95 0.95238095 0.76923077 0.84210526
0.86956522 0.95833333 0.95652174 0.82608696]
mean value: 0.8863354665929036
key: train_precision
value: [0.87242798 0.91326531 0.91627907 0.90909091 0.94054054 0.90954774
0.91919192 0.91794872 0.89795918 0.91625616]
mean value: 0.9112507526203477
key: test_recall
value: [0.77777778 0.7037037 0.7037037 0.74074074 0.74074074 0.57142857
0.71428571 0.82142857 0.78571429 0.67857143]
mean value: 0.7238095238095238
key: train_recall
value: [0.85483871 0.72177419 0.79435484 0.72580645 0.7016129 0.73279352
0.73684211 0.72469636 0.71255061 0.75303644]
mean value: 0.7458306125114275
key: test_roc_auc
value: [0.8531746 0.78042328 0.83399471 0.85251323 0.76322751 0.73015873
0.8015873 0.89219577 0.87433862 0.76521164]
mean value: 0.8146825396825397
key: train_roc_auc
value: [0.86466632 0.82647414 0.86074017 0.82646598 0.82853925 0.83010644
0.83616299 0.83009011 0.81595272 0.84224403]
mean value: 0.8361442144442993
key: test_jcc
value: [0.72413793 0.61290323 0.67857143 0.71428571 0.60606061 0.51612903
0.64516129 0.79310345 0.75862069 0.59375 ]
mean value: 0.6642723366270363
key: train_jcc
value: [0.75985663 0.6754717 0.7406015 0.67669173 0.67181467 0.68301887
0.69201521 0.68060837 0.65917603 0.70454545]
mean value: 0.6943800160411975
MCC on Blind test: 0.34
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00835395 0.00815177 0.00788021 0.00780869 0.00764942 0.00775027
0.00766587 0.0077424 0.00780225 0.00769472]
mean value: 0.007849955558776855
key: score_time
value: [0.00892568 0.00875401 0.00813055 0.00789833 0.00855637 0.00791192
0.0080204 0.00795603 0.00823951 0.00798535]
mean value: 0.008237814903259278
key: test_mcc
value: [0.53452248 0.78410665 0.63745526 0.85449735 0.68504815 0.7112589
0.85695439 0.78353876 0.85695439 0.63841116]
mean value: 0.7342747491731905
key: train_mcc
value: [0.75012681 0.77383014 0.7860094 0.72613214 0.77778141 0.74958366
0.76193358 0.74958366 0.72166787 0.74199798]
mean value: 0.7538646637900721
key: test_accuracy
value: [0.76363636 0.89090909 0.81818182 0.92727273 0.83636364 0.85454545
0.92727273 0.89090909 0.92727273 0.81818182]
mean value: 0.8654545454545455
key: train_accuracy
value: [0.87474747 0.88686869 0.89292929 0.86262626 0.88888889 0.87474747
0.88080808 0.87474747 0.86060606 0.87070707]
mean value: 0.8767676767676768
key: test_fscore
value: [0.73469388 0.89285714 0.80769231 0.92592593 0.84745763 0.85185185
0.92592593 0.89655172 0.92592593 0.81481481]
mean value: 0.862369712380149
key: train_fscore
value: [0.87242798 0.888 0.89421158 0.85950413 0.88933602 0.87346939
0.88223553 0.87346939 0.85773196 0.8677686 ]
mean value: 0.8758154566969916
key: test_precision
value: [0.81818182 0.86206897 0.84 0.92592593 0.78125 0.88461538
0.96153846 0.86666667 0.96153846 0.84615385]
mean value: 0.8747939530137806
key: train_precision
value: [0.8907563 0.88095238 0.88537549 0.88135593 0.8875502 0.88065844
0.87007874 0.88065844 0.87394958 0.88607595]
mean value: 0.8817411452335624
key: test_recall
value: [0.66666667 0.92592593 0.77777778 0.92592593 0.92592593 0.82142857
0.89285714 0.92857143 0.89285714 0.78571429]
mean value: 0.8543650793650793
key: train_recall
value: [0.85483871 0.89516129 0.90322581 0.83870968 0.89112903 0.86639676
0.89473684 0.86639676 0.84210526 0.85020243]
mean value: 0.8702902572809195
key: test_roc_auc
value: [0.76190476 0.89153439 0.81746032 0.92724868 0.83796296 0.85515873
0.92791005 0.89021164 0.92791005 0.81878307]
mean value: 0.8656084656084656
key: train_roc_auc
value: [0.87478778 0.8868519 0.89290845 0.86267468 0.88888435 0.87473064
0.88083616 0.87473064 0.86056876 0.87066573]
mean value: 0.8767639088415828
key: test_jcc
value: [0.58064516 0.80645161 0.67741935 0.86206897 0.73529412 0.74193548
0.86206897 0.8125 0.86206897 0.6875 ]
mean value: 0.7627952627102008
key: train_jcc
value: [0.77372263 0.79856115 0.80866426 0.75362319 0.80072464 0.77536232
0.78928571 0.77536232 0.75090253 0.76642336]
mean value: 0.7792632101538037
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00789285 0.00730848 0.00801301 0.00789571 0.0079782 0.0079236
0.00803757 0.00805783 0.00799203 0.00808692]
mean value: 0.007918620109558105
key: score_time
value: [0.01133084 0.0164237 0.01200867 0.01205802 0.01202822 0.01195431
0.01304603 0.01204181 0.01285148 0.0128901 ]
mean value: 0.012663316726684571
key: test_mcc
value: [0.63745526 0.63745526 0.56441351 0.85449735 0.61131498 0.81854376
0.81878307 0.86334835 0.89139151 0.63624339]
mean value: 0.7333446438524339
key: train_mcc
value: [0.80310724 0.76975822 0.81041362 0.76604064 0.82627008 0.79394672
0.7820578 0.77375802 0.77376541 0.78192653]
mean value: 0.7881044273086666
key: test_accuracy
value: [0.81818182 0.81818182 0.78181818 0.92727273 0.8 0.90909091
0.90909091 0.92727273 0.94545455 0.81818182]
mean value: 0.8654545454545455
key: train_accuracy
value: [0.9010101 0.88484848 0.90505051 0.88282828 0.91313131 0.8969697
0.89090909 0.88686869 0.88686869 0.89090909]
mean value: 0.8939393939393939
key: test_fscore
value: [0.80769231 0.80769231 0.76923077 0.92592593 0.81355932 0.9122807
0.90909091 0.93333333 0.94736842 0.82142857]
mean value: 0.864760256923504
key: train_fscore
value: [0.90373281 0.88438134 0.90656064 0.88492063 0.91313131 0.8969697
0.892 0.88617886 0.88709677 0.89156627]
mean value: 0.8946538330419604
key: test_precision
value: [0.84 0.84 0.8 0.92592593 0.75 0.89655172
0.92592593 0.875 0.93103448 0.82142857]
mean value: 0.8605866630176975
key: train_precision
value: [0.88122605 0.88979592 0.89411765 0.87109375 0.91497976 0.89516129
0.88142292 0.88979592 0.88353414 0.88446215]
mean value: 0.8885589547682757
key: test_recall
value: [0.77777778 0.77777778 0.74074074 0.92592593 0.88888889 0.92857143
0.89285714 1. 0.96428571 0.82142857]
mean value: 0.8718253968253968
key: train_recall
value: [0.92741935 0.87903226 0.91935484 0.89919355 0.91129032 0.89878543
0.90283401 0.88259109 0.89068826 0.89878543]
mean value: 0.9009974533106961
key: test_roc_auc
value: [0.81746032 0.81746032 0.78108466 0.92724868 0.8015873 0.90873016
0.90939153 0.92592593 0.94510582 0.81812169]
mean value: 0.8652116402116402
key: train_roc_auc
value: [0.90095664 0.88486026 0.90502155 0.88279515 0.91313504 0.89697336
0.89093313 0.88686006 0.88687639 0.89092497]
mean value: 0.893933655478647
key: test_jcc
value: [0.67741935 0.67741935 0.625 0.86206897 0.68571429 0.83870968
0.83333333 0.875 0.9 0.6969697 ]
mean value: 0.7671634668631332
key: train_jcc
value: [0.82437276 0.79272727 0.82909091 0.79359431 0.8401487 0.81318681
0.80505415 0.79562044 0.79710145 0.80434783]
mean value: 0.8095244624739278
MCC on Blind test: 0.25
Accuracy on Blind test: 0.72
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01772094 0.01750326 0.01661658 0.01747656 0.01727486 0.0177145
0.01769352 0.01763797 0.01765108 0.01759934]
mean value: 0.017488861083984376
key: score_time
value: [0.01001716 0.00997353 0.0098772 0.00995278 0.01002908 0.01001978
0.01004839 0.0100143 0.00999403 0.01007152]
mean value: 0.009999775886535644
key: test_mcc
value: [0.56841568 0.78410665 0.63745526 0.89153439 0.78410665 0.81854376
0.85695439 0.82269299 0.89139151 0.70899471]
mean value: 0.7764195999656158
key: train_mcc
value: [0.80232908 0.78999446 0.80646861 0.77792658 0.78999446 0.78602685
0.78224023 0.78184638 0.77794469 0.79000817]
mean value: 0.7884779517517934
key: test_accuracy
value: [0.78181818 0.89090909 0.81818182 0.94545455 0.89090909 0.90909091
0.92727273 0.90909091 0.94545455 0.85454545]
mean value: 0.8872727272727272
key: train_accuracy
value: [0.9010101 0.89494949 0.9030303 0.88888889 0.89494949 0.89292929
0.89090909 0.89090909 0.88888889 0.89494949]
mean value: 0.8941414141414141
key: test_fscore
value: [0.76 0.89285714 0.80769231 0.94545455 0.89285714 0.9122807
0.92592593 0.91525424 0.94736842 0.85714286]
mean value: 0.8856833282025075
key: train_fscore
value: [0.90258449 0.896 0.9047619 0.89021956 0.896 0.89378758
0.89243028 0.89112903 0.88977956 0.89558233]
mean value: 0.895227473341023
key: test_precision
value: [0.82608696 0.86206897 0.84 0.92857143 0.86206897 0.89655172
0.96153846 0.87096774 0.93103448 0.85714286]
mean value: 0.8836031583641004
key: train_precision
value: [0.89019608 0.88888889 0.890625 0.88142292 0.88888889 0.88492063
0.87843137 0.8875502 0.88095238 0.88844622]
mean value: 0.8860322585475027
key: test_recall
value: [0.7037037 0.92592593 0.77777778 0.96296296 0.92592593 0.92857143
0.89285714 0.96428571 0.96428571 0.85714286]
mean value: 0.8903439153439153
key: train_recall
value: [0.91532258 0.90322581 0.91935484 0.89919355 0.90322581 0.90283401
0.90688259 0.89473684 0.89878543 0.90283401]
mean value: 0.9046395455139088
key: test_roc_auc
value: [0.78042328 0.89153439 0.81746032 0.9457672 0.89153439 0.90873016
0.92791005 0.90806878 0.94510582 0.85449735]
mean value: 0.8871031746031747
key: train_roc_auc
value: [0.90098113 0.89493274 0.90299726 0.88886803 0.89493274 0.89294926
0.8909413 0.89091681 0.88890884 0.89496539]
mean value: 0.8941393496147316
key: test_jcc
value: [0.61290323 0.80645161 0.67741935 0.89655172 0.80645161 0.83870968
0.86206897 0.84375 0.9 0.75 ]
mean value: 0.799430617352614
key: train_jcc
value: [0.82246377 0.8115942 0.82608696 0.80215827 0.8115942 0.80797101
0.8057554 0.80363636 0.80144404 0.81090909]
mean value: 0.8103613311859039
MCC on Blind test: 0.22
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.48249555 1.45221186 1.51775455 1.50475144 1.40030074 1.4671185
1.77495503 1.3907702 1.52468777 1.48590899]
mean value: 1.5000954627990724
key: score_time
value: [0.01191258 0.01389217 0.0141511 0.01366138 0.01346612 0.01382709
0.01353741 0.01376367 0.01363826 0.01388812]
mean value: 0.013573789596557617
key: test_mcc
value: [0.82269299 0.89153439 0.67602163 0.78353876 0.74603175 0.89153439
0.89642146 0.82269299 0.86334835 0.7112589 ]
mean value: 0.8105075610922889
key: train_mcc
value: [0.96364438 0.95154681 0.95962779 0.96767664 0.96780409 0.96780199
0.96364378 0.97575748 0.96770771 0.96770771]
mean value: 0.9652918361453778
key: test_accuracy
value: [0.90909091 0.94545455 0.83636364 0.89090909 0.87272727 0.94545455
0.94545455 0.90909091 0.92727273 0.85454545]
mean value: 0.9036363636363636
key: train_accuracy
value: [0.98181818 0.97575758 0.97979798 0.98383838 0.98383838 0.98383838
0.98181818 0.98787879 0.98383838 0.98383838]
mean value: 0.9826262626262626
key: test_fscore
value: [0.90196078 0.94545455 0.82352941 0.88461538 0.87272727 0.94545455
0.94339623 0.91525424 0.93333333 0.85185185]
mean value: 0.9017577593218595
key: train_fscore
value: [0.98181818 0.9757085 0.97975709 0.98387097 0.98373984 0.98367347
0.98174442 0.98785425 0.98373984 0.98373984]
mean value: 0.9825646391106369
key: test_precision
value: [0.95833333 0.92857143 0.875 0.92 0.85714286 0.96296296
1. 0.87096774 0.875 0.88461538]
mean value: 0.9132593708561451
key: train_precision
value: [0.98380567 0.9796748 0.98373984 0.98387097 0.99180328 0.99176955
0.98373984 0.98785425 0.9877551 0.9877551 ]
mean value: 0.9861768388410251
key: test_recall
value: [0.85185185 0.96296296 0.77777778 0.85185185 0.88888889 0.92857143
0.89285714 0.96428571 1. 0.82142857]
mean value: 0.8940476190476191
key: train_recall
value: [0.97983871 0.97177419 0.97580645 0.98387097 0.97580645 0.9757085
0.97975709 0.98785425 0.97975709 0.97975709]
mean value: 0.9789930782290714
key: test_roc_auc
value: [0.90806878 0.9457672 0.83531746 0.89021164 0.87301587 0.9457672
0.94642857 0.90806878 0.92592593 0.85515873]
mean value: 0.9033730158730159
key: train_roc_auc
value: [0.98182219 0.97576564 0.97980606 0.98383832 0.98385464 0.98382199
0.98181403 0.98787874 0.98383016 0.98383016]
mean value: 0.9826261917199948
key: test_jcc
value: [0.82142857 0.89655172 0.7 0.79310345 0.77419355 0.89655172
0.89285714 0.84375 0.875 0.74193548]
mean value: 0.8235371643095503
key: train_jcc
value: [0.96428571 0.95256917 0.96031746 0.96825397 0.968 0.96787149
0.96414343 0.976 0.968 0.968 ]
mean value: 0.9657441225056214
MCC on Blind test: 0.26
Accuracy on Blind test: 0.65
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01426578 0.01250982 0.01104569 0.00998855 0.01048017 0.01040983
0.01004791 0.00998306 0.00995708 0.01082873]
mean value: 0.010951662063598632
key: score_time
value: [0.01068878 0.00831199 0.00807023 0.00795078 0.007833 0.00783944
0.00795031 0.00782537 0.00784492 0.00789261]
mean value: 0.008220744132995606
key: test_mcc
value: [0.86334835 0.89153439 0.85449735 0.74569602 0.71735629 0.92724868
0.92724868 0.82269299 0.8565805 0.89153439]
mean value: 0.8497737644332478
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92727273 0.94545455 0.92727273 0.87272727 0.85454545 0.96363636
0.96363636 0.90909091 0.92727273 0.94545455]
mean value: 0.9236363636363636
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92 0.94545455 0.92592593 0.86792453 0.86206897 0.96428571
0.96428571 0.91525424 0.93103448 0.94545455]
mean value: 0.924168865927233
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.92857143 0.92592593 0.88461538 0.80645161 0.96428571
0.96428571 0.87096774 0.9 0.96296296]
mean value: 0.9208066485485841
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.85185185 0.96296296 0.92592593 0.85185185 0.92592593 0.96428571
0.96428571 0.96428571 0.96428571 0.92857143]
mean value: 0.9304232804232804
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92592593 0.9457672 0.92724868 0.8723545 0.85582011 0.96362434
0.96362434 0.90806878 0.9265873 0.9457672 ]
mean value: 0.9234788359788361
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85185185 0.89655172 0.86206897 0.76666667 0.75757576 0.93103448
0.93103448 0.84375 0.87096774 0.89655172]
mean value: 0.8608053397340105
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.36
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10040808 0.10288572 0.1029706 0.10149503 0.09969401 0.10028243
0.10137939 0.10140562 0.10217285 0.10103154]
mean value: 0.10137252807617188
key: score_time
value: [0.01739192 0.01827312 0.01805353 0.01727128 0.01716781 0.01719832
0.01750255 0.01722026 0.01744318 0.01742435]
mean value: 0.017494630813598634
key: test_mcc
value: [0.78961518 0.82337971 0.71049701 0.92962225 0.72754449 0.8565805
0.96428571 0.86334835 0.89602867 0.78174603]
mean value: 0.8342647911704605
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89090909 0.90909091 0.85454545 0.96363636 0.85454545 0.92727273
0.98181818 0.92727273 0.94545455 0.89090909]
mean value: 0.9145454545454546
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88 0.9122807 0.84615385 0.96153846 0.86666667 0.93103448
0.98181818 0.93333333 0.94915254 0.89285714]
mean value: 0.915483535925352
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95652174 0.86666667 0.88 1. 0.78787879 0.9
1. 0.875 0.90322581 0.89285714]
mean value: 0.9062150142984645
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81481481 0.96296296 0.81481481 0.92592593 0.96296296 0.96428571
0.96428571 1. 1. 0.89285714]
mean value: 0.9302910052910053
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88955026 0.91005291 0.85383598 0.96296296 0.85648148 0.9265873
0.98214286 0.92592593 0.94444444 0.89087302]
mean value: 0.9142857142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.78571429 0.83870968 0.73333333 0.92592593 0.76470588 0.87096774
0.96428571 0.875 0.90322581 0.80645161]
mean value: 0.8468319980321878
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.32
Accuracy on Blind test: 0.71
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00785303 0.00793576 0.00822353 0.00800967 0.00797462 0.00781012
0.00777817 0.00785923 0.00833726 0.0080986 ]
mean value: 0.00798799991607666
key: score_time
value: [0.00809956 0.00805473 0.00806642 0.00806236 0.00844526 0.00803256
0.00801802 0.00814533 0.0084455 0.00861764]
mean value: 0.008198738098144531
key: test_mcc
value: [0.67602163 0.86402765 0.49468252 0.67284827 0.79069197 0.89139151
0.92724868 0.81854376 0.81854376 0.34721618]
mean value: 0.7301215940014808
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83636364 0.92727273 0.74545455 0.83636364 0.89090909 0.94545455
0.96363636 0.90909091 0.90909091 0.67272727]
mean value: 0.8636363636363636
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 0.93103448 0.72 0.83018868 0.89655172 0.94736842
0.96428571 0.9122807 0.9122807 0.7 ]
mean value: 0.8637519836753659
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.875 0.87096774 0.7826087 0.84615385 0.83870968 0.93103448
0.96428571 0.89655172 0.89655172 0.65625 ]
mean value: 0.8558113606481056
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.77777778 1. 0.66666667 0.81481481 0.96296296 0.96428571
0.96428571 0.92857143 0.92857143 0.75 ]
mean value: 0.8757936507936508
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.83531746 0.92857143 0.74404762 0.83597884 0.89219577 0.94510582
0.96362434 0.90873016 0.90873016 0.6712963 ]
mean value: 0.8633597883597883
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 0.87096774 0.5625 0.70967742 0.8125 0.9
0.93103448 0.83870968 0.83870968 0.53846154]
mean value: 0.7702560537349191
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.19
Accuracy on Blind test: 0.65
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.31766438 1.37345815 1.27461958 1.28742099 1.28492332 1.28699183
1.29451942 1.28101182 1.29123402 1.28577328]
mean value: 1.2977616786956787
key: score_time
value: [0.09975529 0.15991688 0.09050679 0.09112549 0.0906496 0.09106612
0.09059644 0.09088278 0.09059429 0.09115982]
mean value: 0.09862534999847412
key: test_mcc
value: [0.89602867 0.92980214 0.82269299 0.89139151 0.92980214 0.96423926
0.96428571 0.89602867 0.96423926 0.92724868]
mean value: 0.9185759034850024
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94545455 0.96363636 0.90909091 0.94545455 0.96363636 0.98181818
0.98181818 0.94545455 0.98181818 0.96363636]
mean value: 0.9581818181818181
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.96428571 0.90196078 0.94339623 0.96428571 0.98245614
0.98181818 0.94915254 0.98245614 0.96428571]
mean value: 0.9575273629067016
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.95833333 0.96153846 0.93103448 0.96551724
1. 0.90322581 0.96551724 0.96428571]
mean value: 0.9580486763884984
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 0.85185185 0.92592593 1. 1.
0.96428571 1. 1. 0.96428571]
mean value: 0.9595238095238096
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.96428571 0.90806878 0.94510582 0.96428571 0.98148148
0.98214286 0.94444444 0.98148148 0.96362434]
mean value: 0.957936507936508
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.93103448 0.82142857 0.89285714 0.93103448 0.96551724
0.96428571 0.90322581 0.96551724 0.93103448]
mean value: 0.9194824054946413
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.51
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.91016841 0.90267611 0.96385503 0.89476395 0.87512565 0.92300725
0.92452002 0.88380647 0.98288059 0.92339325]
mean value: 0.9184196710586547
key: score_time
value: [0.24338746 0.25723505 0.2364409 0.19419646 0.26264167 0.20185113
0.21379876 0.27131295 0.2565093 0.25926137]
mean value: 0.2396635055541992
key: test_mcc
value: [0.89602867 0.92980214 0.82269299 0.89139151 0.92980214 0.92724868
0.96428571 0.89602867 0.96423926 0.92724868]
mean value: 0.914876845571549
key: train_mcc
value: [0.94766581 0.95154523 0.95574863 0.94766581 0.95163767 0.94754543
0.94371421 0.95556354 0.9395879 0.94767006]
mean value: 0.9488344285838511
key: test_accuracy
value: [0.94545455 0.96363636 0.90909091 0.94545455 0.96363636 0.96363636
0.98181818 0.94545455 0.98181818 0.96363636]
mean value: 0.9563636363636363
key: train_accuracy
value: [0.97373737 0.97575758 0.97777778 0.97373737 0.97575758 0.97373737
0.97171717 0.97777778 0.96969697 0.97373737]
mean value: 0.9743434343434344
key: test_fscore
value: [0.94117647 0.96428571 0.90196078 0.94339623 0.96428571 0.96428571
0.98181818 0.94915254 0.98245614 0.96428571]
mean value: 0.9557103203001853
key: train_fscore
value: [0.9740519 0.97590361 0.97804391 0.9740519 0.976 0.97384306
0.972 0.97777778 0.96993988 0.9739479 ]
mean value: 0.974555993072763
key: test_precision
value: [1. 0.93103448 0.95833333 0.96153846 0.93103448 0.96428571
1. 0.90322581 0.96551724 0.96428571]
mean value: 0.9579255236791389
key: train_precision
value: [0.96442688 0.972 0.96837945 0.96442688 0.96825397 0.968
0.96047431 0.97580645 0.96031746 0.96428571]
mean value: 0.9666371104351469
key: test_recall
value: [0.88888889 1. 0.85185185 0.92592593 1. 0.96428571
0.96428571 1. 1. 0.96428571]
mean value: 0.955952380952381
key: train_recall
value: [0.98387097 0.97983871 0.98790323 0.98387097 0.98387097 0.97975709
0.98380567 0.97975709 0.97975709 0.98380567]
mean value: 0.9826237429802795
key: test_roc_auc
value: [0.94444444 0.96428571 0.90806878 0.94510582 0.96428571 0.96362434
0.98214286 0.94444444 0.98148148 0.96362434]
mean value: 0.9561507936507937
key: train_roc_auc
value: [0.97371686 0.97574931 0.97775728 0.97371686 0.97574115 0.97374951
0.97174154 0.97778177 0.96971725 0.97375767]
mean value: 0.9743429215097297
key: test_jcc
value: [0.88888889 0.93103448 0.82142857 0.89285714 0.93103448 0.93103448
0.96428571 0.90322581 0.96551724 0.93103448]
mean value: 0.9160341296325724
key: train_jcc
value: [0.94941634 0.95294118 0.95703125 0.94941634 0.953125 0.94901961
0.94552529 0.95652174 0.94163424 0.94921875]
mean value: 0.9503849741342992
MCC on Blind test: 0.2
Accuracy on Blind test: 0.52
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01822448 0.00774431 0.00775194 0.00768995 0.0077107 0.00761104
0.00783753 0.00774765 0.00785279 0.00792027]
mean value: 0.008809065818786621
key: score_time
value: [0.01004004 0.00803638 0.00816011 0.00804186 0.00801206 0.00807762
0.00851011 0.00811744 0.0079689 0.00807619]
mean value: 0.008304071426391602
key: test_mcc
value: [0.53452248 0.78410665 0.63745526 0.85449735 0.68504815 0.7112589
0.85695439 0.78353876 0.85695439 0.63841116]
mean value: 0.7342747491731905
key: train_mcc
value: [0.75012681 0.77383014 0.7860094 0.72613214 0.77778141 0.74958366
0.76193358 0.74958366 0.72166787 0.74199798]
mean value: 0.7538646637900721
key: test_accuracy
value: [0.76363636 0.89090909 0.81818182 0.92727273 0.83636364 0.85454545
0.92727273 0.89090909 0.92727273 0.81818182]
mean value: 0.8654545454545455
key: train_accuracy
value: [0.87474747 0.88686869 0.89292929 0.86262626 0.88888889 0.87474747
0.88080808 0.87474747 0.86060606 0.87070707]
mean value: 0.8767676767676768
key: test_fscore
value: [0.73469388 0.89285714 0.80769231 0.92592593 0.84745763 0.85185185
0.92592593 0.89655172 0.92592593 0.81481481]
mean value: 0.862369712380149
key: train_fscore
value: [0.87242798 0.888 0.89421158 0.85950413 0.88933602 0.87346939
0.88223553 0.87346939 0.85773196 0.8677686 ]
mean value: 0.8758154566969916
key: test_precision
value: [0.81818182 0.86206897 0.84 0.92592593 0.78125 0.88461538
0.96153846 0.86666667 0.96153846 0.84615385]
mean value: 0.8747939530137806
key: train_precision
value: [0.8907563 0.88095238 0.88537549 0.88135593 0.8875502 0.88065844
0.87007874 0.88065844 0.87394958 0.88607595]
mean value: 0.8817411452335624
key: test_recall
value: [0.66666667 0.92592593 0.77777778 0.92592593 0.92592593 0.82142857
0.89285714 0.92857143 0.89285714 0.78571429]
mean value: 0.8543650793650793
key: train_recall
value: [0.85483871 0.89516129 0.90322581 0.83870968 0.89112903 0.86639676
0.89473684 0.86639676 0.84210526 0.85020243]
mean value: 0.8702902572809195
key: test_roc_auc
value: [0.76190476 0.89153439 0.81746032 0.92724868 0.83796296 0.85515873
0.92791005 0.89021164 0.92791005 0.81878307]
mean value: 0.8656084656084656
key: train_roc_auc
value: [0.87478778 0.8868519 0.89290845 0.86267468 0.88888435 0.87473064
0.88083616 0.87473064 0.86056876 0.87066573]
mean value: 0.8767639088415828
key: test_jcc
value: [0.58064516 0.80645161 0.67741935 0.86206897 0.73529412 0.74193548
0.86206897 0.8125 0.86206897 0.6875 ]
mean value: 0.7627952627102008
key: train_jcc
value: [0.77372263 0.79856115 0.80866426 0.75362319 0.80072464 0.77536232
0.78928571 0.77536232 0.75090253 0.76642336]
mean value: 0.7792632101538037
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.24260426 0.04918981 0.05251241 0.05346513 0.05343127 0.05336189
0.05402541 0.05282021 0.05996943 0.0555234 ]
mean value: 0.07269032001495361
key: score_time
value: [0.01044059 0.0108521 0.01054335 0.01029539 0.0097692 0.01019621
0.01065063 0.01008081 0.00973535 0.01028442]
mean value: 0.010284805297851562
key: test_mcc
value: [0.89602867 0.96428571 0.89139151 0.89139151 0.92724868 0.96423926
0.96423926 0.89602867 0.92962225 0.92724868]
mean value: 0.9251724200737174
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94545455 0.98181818 0.94545455 0.94545455 0.96363636 0.98181818
0.98181818 0.94545455 0.96363636 0.96363636]
mean value: 0.9618181818181818
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.98181818 0.94339623 0.94339623 0.96296296 0.98245614
0.98245614 0.94915254 0.96551724 0.96428571]
mean value: 0.9616617846939229
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96153846 0.96153846 0.96296296 0.96551724
0.96551724 0.90322581 0.93333333 0.96428571]
mean value: 0.9582204937154881
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 0.92592593 0.92592593 0.96296296 1.
1. 1. 1. 0.96428571]
mean value: 0.9667989417989418
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.98214286 0.94510582 0.94510582 0.96362434 0.98148148
0.98148148 0.94444444 0.96296296 0.96362434]
mean value: 0.9614417989417989
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.96428571 0.89285714 0.89285714 0.92857143 0.96551724
0.96551724 0.90322581 0.93333333 0.93103448]
mean value: 0.9266088422762505
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.38
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01531887 0.04107666 0.04202557 0.04105949 0.04171562 0.01796556
0.01756859 0.0427742 0.04265285 0.01785755]
mean value: 0.032001495361328125
key: score_time
value: [0.01036215 0.02154422 0.01961827 0.02058625 0.02116394 0.01096559
0.01085567 0.02103901 0.02116776 0.01119232]
mean value: 0.016849517822265625
key: test_mcc
value: [0.67284827 0.78410665 0.64214885 0.89153439 0.78410665 0.89139151
0.89153439 0.8565805 0.92724868 0.78353876]
mean value: 0.8125038646516862
key: train_mcc
value: [0.8435716 0.85067196 0.87981045 0.85892085 0.85478898 0.86365469
0.85916382 0.86702055 0.85107823 0.84299263]
mean value: 0.8571673756575152
key: test_accuracy
value: [0.83636364 0.89090909 0.81818182 0.94545455 0.89090909 0.94545455
0.94545455 0.92727273 0.96363636 0.89090909]
mean value: 0.9054545454545454
key: train_accuracy
value: [0.92121212 0.92525253 0.93939394 0.92929293 0.92727273 0.93131313
0.92929293 0.93333333 0.92525253 0.92121212]
mean value: 0.9282828282828283
key: test_fscore
value: [0.83018868 0.89285714 0.8 0.94545455 0.89285714 0.94736842
0.94545455 0.93103448 0.96428571 0.89655172]
mean value: 0.9046052398103557
key: train_fscore
value: [0.92337917 0.9261477 0.94094488 0.9304175 0.92828685 0.93280632
0.9304175 0.93413174 0.92644135 0.92246521]
mean value: 0.9295438225256318
key: test_precision
value: [0.84615385 0.86206897 0.86956522 0.92857143 0.86206897 0.93103448
0.96296296 0.9 0.96428571 0.86666667]
mean value: 0.8993378249825026
key: train_precision
value: [0.90038314 0.91699605 0.91923077 0.91764706 0.91732283 0.91119691
0.9140625 0.92125984 0.91015625 0.90625 ]
mean value: 0.9134505355609847
key: test_recall
value: [0.81481481 0.92592593 0.74074074 0.96296296 0.92592593 0.96428571
0.92857143 0.96428571 0.96428571 0.92857143]
mean value: 0.912037037037037
key: train_recall
value: [0.94758065 0.93548387 0.96370968 0.94354839 0.93951613 0.95546559
0.94736842 0.94736842 0.94331984 0.93927126]
mean value: 0.9462632231944625
key: test_roc_auc
value: [0.83597884 0.89153439 0.81679894 0.9457672 0.89153439 0.94510582
0.9457672 0.9265873 0.96362434 0.89021164]
mean value: 0.9052910052910054
key: train_roc_auc
value: [0.92115874 0.92523181 0.93934472 0.92926407 0.92724794 0.93136183
0.92932937 0.93336163 0.92528895 0.92124853]
mean value: 0.9282837599582081
key: test_jcc
value: [0.70967742 0.80645161 0.66666667 0.89655172 0.80645161 0.9
0.89655172 0.87096774 0.93103448 0.8125 ]
mean value: 0.8296852984797923
key: train_jcc
value: [0.85766423 0.86245353 0.88847584 0.86988848 0.866171 0.87407407
0.86988848 0.87640449 0.86296296 0.85608856]
mean value: 0.8684071649301385
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01084638 0.00925279 0.00854015 0.0082531 0.0082016 0.00828433
0.00811386 0.0079782 0.00824904 0.00820422]
mean value: 0.008592367172241211
key: score_time
value: [0.01097393 0.00891614 0.00880241 0.00852537 0.00860667 0.0081892
0.00847697 0.00831318 0.00843 0.00821924]
mean value: 0.008745312690734863
key: test_mcc
value: [0.60876172 0.78410665 0.63745526 0.89153439 0.71735629 0.81854376
0.85695439 0.82269299 0.89139151 0.70899471]
mean value: 0.7737791675722022
key: train_mcc
value: [0.79012008 0.77375802 0.79409222 0.76162335 0.77778141 0.77376541
0.76589215 0.76970043 0.76565561 0.78592069]
mean value: 0.7758309364223852
key: test_accuracy
value: [0.8 0.89090909 0.81818182 0.94545455 0.85454545 0.90909091
0.92727273 0.90909091 0.94545455 0.85454545]
mean value: 0.8854545454545455
key: train_accuracy
value: [0.89494949 0.88686869 0.8969697 0.88080808 0.88888889 0.88686869
0.88282828 0.88484848 0.88282828 0.89292929]
mean value: 0.8878787878787879
key: test_fscore
value: [0.7755102 0.89285714 0.80769231 0.94545455 0.86206897 0.9122807
0.92592593 0.91525424 0.94736842 0.85714286]
mean value: 0.8841555308766806
key: train_fscore
value: [0.89641434 0.8875502 0.89820359 0.88080808 0.88933602 0.88709677
0.884 0.88438134 0.88259109 0.89336016]
mean value: 0.8883741600170872
key: test_precision
value: [0.86363636 0.86206897 0.84 0.92857143 0.80645161 0.89655172
0.96153846 0.87096774 0.93103448 0.85714286]
mean value: 0.8817963638141614
key: train_precision
value: [0.88582677 0.884 0.88932806 0.88259109 0.8875502 0.88353414
0.87351779 0.88617886 0.88259109 0.888 ]
mean value: 0.8843118006828748
key: test_recall
value: [0.7037037 0.92592593 0.77777778 0.96296296 0.92592593 0.92857143
0.89285714 0.96428571 0.96428571 0.85714286]
mean value: 0.8903439153439153
key: train_recall
value: [0.90725806 0.89112903 0.90725806 0.87903226 0.89112903 0.89068826
0.89473684 0.88259109 0.88259109 0.89878543]
mean value: 0.8925199164163511
key: test_roc_auc
value: [0.79828042 0.89153439 0.81746032 0.9457672 0.85582011 0.90873016
0.92791005 0.90806878 0.94510582 0.85449735]
mean value: 0.8853174603174603
key: train_roc_auc
value: [0.89492458 0.88686006 0.89694887 0.88081168 0.88888435 0.88687639
0.88285229 0.88484393 0.8828278 0.8929411 ]
mean value: 0.8878771059161551
key: test_jcc
value: [0.63333333 0.80645161 0.67741935 0.89655172 0.75757576 0.83870968
0.86206897 0.84375 0.9 0.75 ]
mean value: 0.7965860425725554
key: train_jcc
value: [0.81227437 0.79783394 0.81521739 0.78700361 0.80072464 0.79710145
0.7921147 0.79272727 0.78985507 0.80727273]
mean value: 0.7992125159422541
MCC on Blind test: 0.29
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0109973 0.01174498 0.01200843 0.01177812 0.01327801 0.01348186
0.01264 0.01432228 0.02716994 0.01318049]
mean value: 0.014060139656066895
key: score_time
value: [0.00866151 0.01013803 0.01016569 0.01051307 0.01054215 0.01067781
0.01127958 0.01156712 0.02180862 0.01062226]
mean value: 0.011597585678100587
key: test_mcc
value: [0.75724019 0.78353876 0.67602163 0.75878131 0.71588202 0.89153439
0.96428571 0.89602867 0.92724868 0.92980214]
mean value: 0.830036349769513
key: train_mcc
value: [0.90767739 0.75673387 0.89988762 0.82550688 0.81837405 0.89599275
0.84921709 0.857966 0.89212884 0.86668482]
mean value: 0.857016930626917
key: test_accuracy
value: [0.87272727 0.89090909 0.83636364 0.87272727 0.85454545 0.94545455
0.98181818 0.94545455 0.96363636 0.96363636]
mean value: 0.9127272727272727
key: train_accuracy
value: [0.95353535 0.87474747 0.94949495 0.90909091 0.90505051 0.94747475
0.92323232 0.92727273 0.94545455 0.93131313]
mean value: 0.9266666666666666
key: test_fscore
value: [0.85714286 0.88461538 0.82352941 0.88135593 0.84 0.94545455
0.98181818 0.94915254 0.96428571 0.96296296]
mean value: 0.9090317532620623
key: train_fscore
value: [0.95277207 0.86580087 0.94845361 0.91493384 0.89804772 0.94605809
0.91983122 0.93023256 0.94386694 0.92765957]
mean value: 0.9247656499131667
key: test_precision
value: [0.95454545 0.92 0.875 0.8125 0.91304348 0.96296296
1. 0.90322581 0.96428571 1. ]
mean value: 0.9305563416506615
key: train_precision
value: [0.9707113 0.93457944 0.97046414 0.86120996 0.97183099 0.97021277
0.96035242 0.89219331 0.97008547 0.97757848]
mean value: 0.9479218264509782
key: test_recall
value: [0.77777778 0.85185185 0.77777778 0.96296296 0.77777778 0.92857143
0.96428571 1. 0.96428571 0.92857143]
mean value: 0.8933862433862434
key: train_recall
value: [0.93548387 0.80645161 0.92741935 0.97580645 0.83467742 0.92307692
0.88259109 0.97165992 0.91902834 0.88259109]
mean value: 0.9058786078098472
key: test_roc_auc
value: [0.87103175 0.89021164 0.83531746 0.87433862 0.8531746 0.9457672
0.98214286 0.94444444 0.96362434 0.96428571]
mean value: 0.9124338624338624
key: train_roc_auc
value: [0.95357189 0.87488573 0.94953964 0.90895586 0.90519296 0.94742556
0.92315039 0.92736222 0.94540127 0.9312149 ]
mean value: 0.92667004048583
key: test_jcc
value: [0.75 0.79310345 0.7 0.78787879 0.72413793 0.89655172
0.96428571 0.90322581 0.93103448 0.92857143]
mean value: 0.837878932339444
key: train_jcc
value: [0.90980392 0.76335878 0.90196078 0.84320557 0.81496063 0.8976378
0.8515625 0.86956522 0.89370079 0.86507937]
mean value: 0.8610835354490294
MCC on Blind test: 0.23
Accuracy on Blind test: 0.64
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0145483 0.01222491 0.01315165 0.01257706 0.01379681 0.01266575
0.0136075 0.01250958 0.014539 0.01288724]
mean value: 0.013250780105590821
key: score_time
value: [0.01066113 0.01051998 0.01067948 0.01066899 0.01064396 0.01054311
0.01067638 0.01064086 0.01070881 0.01063132]
mean value: 0.010637402534484863
key: test_mcc
value: [0.81854376 0.82269299 0.71049701 0.83147942 0.74935731 0.83147942
0.8565805 0.80032673 0.85695439 0.83251448]
mean value: 0.8110426022713871
key: train_mcc
value: [0.89986978 0.78532023 0.91930903 0.90350829 0.79835384 0.80158821
0.82922447 0.76325368 0.8535924 0.78649322]
mean value: 0.8340513160102464
key: test_accuracy
value: [0.90909091 0.90909091 0.85454545 0.90909091 0.87272727 0.90909091
0.92727273 0.89090909 0.92727273 0.90909091]
mean value: 0.9018181818181817
key: train_accuracy
value: [0.94949495 0.88484848 0.95959596 0.95151515 0.89090909 0.89494949
0.91111111 0.87272727 0.92323232 0.88484848]
mean value: 0.9123232323232323
key: test_fscore
value: [0.90566038 0.90196078 0.84615385 0.89795918 0.8627451 0.91803279
0.93103448 0.90322581 0.92592593 0.90196078]
mean value: 0.8994659075873878
key: train_fscore
value: [0.95069034 0.87248322 0.96 0.95081967 0.87892377 0.90298507
0.91634981 0.88482633 0.91774892 0.87133183]
mean value: 0.9106158951845009
key: test_precision
value: [0.92307692 0.95833333 0.88 1. 0.91666667 0.84848485
0.9 0.82352941 0.96153846 1. ]
mean value: 0.9211629644864939
key: train_precision
value: [0.93050193 0.9798995 0.95238095 0.96666667 0.98989899 0.83737024
0.86379928 0.80666667 0.98604651 0.98469388]
mean value: 0.9297924618150225
key: test_recall
value: [0.88888889 0.85185185 0.81481481 0.81481481 0.81481481 1.
0.96428571 1. 0.89285714 0.82142857]
mean value: 0.8863756613756614
key: train_recall
value: [0.97177419 0.78629032 0.96774194 0.93548387 0.79032258 0.97975709
0.9757085 0.97975709 0.8582996 0.78137652]
mean value: 0.9026511688650908
key: test_roc_auc
value: [0.90873016 0.90806878 0.85383598 0.90740741 0.87169312 0.90740741
0.9265873 0.88888889 0.92791005 0.91071429]
mean value: 0.9011243386243386
key: train_roc_auc
value: [0.94944985 0.885048 0.95957947 0.9515476 0.89111271 0.89512048
0.91124135 0.87294306 0.92310141 0.88463987]
mean value: 0.9123783792608071
key: test_jcc
value: [0.82758621 0.82142857 0.73333333 0.81481481 0.75862069 0.84848485
0.87096774 0.82352941 0.86206897 0.82142857]
mean value: 0.8182263155259295
key: train_jcc
value: [0.90601504 0.77380952 0.92307692 0.90625 0.784 0.82312925
0.84561404 0.79344262 0.848 0.772 ]
mean value: 0.8375337394219651
MCC on Blind test: 0.23
Accuracy on Blind test: 0.65
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.10882998 0.09430504 0.09354877 0.09708071 0.09502983 0.09874582
0.09948301 0.10101175 0.10146403 0.1005578 ]
mean value: 0.09900567531585694
key: score_time
value: [0.01448226 0.01464081 0.01445723 0.01509094 0.01563764 0.01575255
0.01563501 0.01564932 0.01564193 0.01550007]
mean value: 0.015248775482177734
key: test_mcc
value: [0.89602867 0.92980214 0.74569602 0.89139151 0.96428571 0.96423926
0.92724868 0.89602867 0.96423926 0.92724868]
mean value: 0.9106208597080531
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94545455 0.96363636 0.87272727 0.94545455 0.98181818 0.98181818
0.96363636 0.94545455 0.98181818 0.96363636]
mean value: 0.9545454545454546
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.96428571 0.86792453 0.94339623 0.98181818 0.98245614
0.96428571 0.94915254 0.98245614 0.96428571]
mean value: 0.9541237373055177
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.93103448 0.88461538 0.96153846 0.96428571 0.96551724
0.96428571 0.90322581 0.96551724 0.96428571]
mean value: 0.9504305760979843
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88888889 1. 0.85185185 0.92592593 1. 1.
0.96428571 1. 1. 0.96428571]
mean value: 0.9595238095238096
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94444444 0.96428571 0.8723545 0.94510582 0.98214286 0.98148148
0.96362434 0.94444444 0.98148148 0.96362434]
mean value: 0.9542989417989418
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.93103448 0.76666667 0.89285714 0.96428571 0.96551724
0.93103448 0.90322581 0.96551724 0.93103448]
mean value: 0.9140062150184508
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.4
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.04016852 0.0324285 0.02866578 0.0298512 0.03131413 0.04710269
0.03221202 0.03381968 0.03193164 0.03873563]
mean value: 0.03462297916412353
key: score_time
value: [0.02359104 0.02912378 0.0166347 0.03185606 0.01757717 0.01780367
0.02005625 0.01775956 0.02499628 0.03069973]
mean value: 0.023009824752807616
key: test_mcc
value: [0.96423926 0.92980214 0.8565805 0.85449735 0.92980214 0.96423926
1. 0.89602867 0.96423926 0.89153439]
mean value: 0.9250962972081643
key: train_mcc
value: [0.99195168 0.9838707 0.97980606 0.9878869 0.98383832 0.97980573
0.97980606 0.99596768 0.97172522 0.98795103]
mean value: 0.9842609370911118
key: test_accuracy
value: [0.98181818 0.96363636 0.92727273 0.92727273 0.96363636 0.98181818
1. 0.94545455 0.98181818 0.94545455]
mean value: 0.9618181818181818
key: train_accuracy
value: [0.9959596 0.99191919 0.98989899 0.99393939 0.99191919 0.98989899
0.98989899 0.9979798 0.98585859 0.99393939]
mean value: 0.9921212121212121
key: test_fscore
value: [0.98113208 0.96428571 0.92307692 0.92592593 0.96428571 0.98245614
1. 0.94915254 0.98245614 0.94545455]
mean value: 0.9618225721575157
key: train_fscore
value: [0.99595142 0.99190283 0.98989899 0.99393939 0.99193548 0.98985801
0.98989899 0.9979716 0.98585859 0.99389002]
mean value: 0.9921105329450134
key: test_precision
value: [1. 0.93103448 0.96 0.92592593 0.93103448 0.96551724
1. 0.90322581 0.96551724 0.96296296]
mean value: 0.9545218143616364
key: train_precision
value: [1. 0.99593496 0.99190283 0.99595142 0.99193548 0.99186992
0.98790323 1. 0.98387097 1. ]
mean value: 0.9939368806480281
key: test_recall
value: [0.96296296 1. 0.88888889 0.92592593 1. 1.
1. 1. 1. 0.92857143]
mean value: 0.9706349206349206
key: train_recall
value: [0.99193548 0.98790323 0.98790323 0.99193548 0.99193548 0.98785425
0.99190283 0.99595142 0.98785425 0.98785425]
mean value: 0.990302990727439
key: test_roc_auc
value: [0.98148148 0.96428571 0.9265873 0.92724868 0.96428571 0.98148148
1. 0.94444444 0.98148148 0.9457672 ]
mean value: 0.9617063492063492
key: train_roc_auc
value: [0.99596774 0.99192732 0.98990303 0.99394345 0.99191916 0.98989487
0.98990303 0.99797571 0.98586261 0.99392713]
mean value: 0.9921224043359018
key: test_jcc
value: [0.96296296 0.93103448 0.85714286 0.86206897 0.93103448 0.96551724
1. 0.90322581 0.96551724 0.89655172]
mean value: 0.9275055764488467
key: train_jcc
value: [0.99193548 0.98393574 0.98 0.98795181 0.984 0.97991968
0.98 0.99595142 0.97211155 0.98785425]
mean value: 0.9843659934587685
MCC on Blind test: 0.1
Accuracy on Blind test: 0.34
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.09521818 0.14583254 0.16052413 0.17061996 0.16926599 0.16544843
0.14057255 0.18005562 0.18195176 0.14595151]
mean value: 0.15554406642913818
key: score_time
value: [0.01243162 0.02033305 0.01982188 0.02640724 0.02747393 0.01361799
0.02945495 0.02985644 0.0272851 0.0134213 ]
mean value: 0.02201035022735596
key: test_mcc
value: [0.64214885 0.78410665 0.60000053 0.89139151 0.68504815 0.8565805
0.85695439 0.86334835 0.89139151 0.74569602]
mean value: 0.781666645397211
key: train_mcc
value: [0.86751154 0.84656958 0.86751154 0.85498218 0.85478898 0.83883199
0.84716822 0.85466123 0.85085332 0.85500107]
mean value: 0.8537879647323043
key: test_accuracy
value: [0.81818182 0.89090909 0.8 0.94545455 0.83636364 0.92727273
0.92727273 0.92727273 0.94545455 0.87272727]
mean value: 0.889090909090909
key: train_accuracy
value: [0.93333333 0.92323232 0.93333333 0.92727273 0.92727273 0.91919192
0.92323232 0.92727273 0.92525253 0.92727273]
mean value: 0.9266666666666666
key: test_fscore
value: [0.8 0.89285714 0.79245283 0.94339623 0.84745763 0.93103448
0.92592593 0.93333333 0.94736842 0.87719298]
mean value: 0.8891018972106213
key: train_fscore
value: [0.93491124 0.924 0.93491124 0.92857143 0.92828685 0.92031873
0.92460317 0.92771084 0.9261477 0.92828685]
mean value: 0.92777480666249
key: test_precision
value: [0.86956522 0.86206897 0.80769231 0.96153846 0.78125 0.9
0.96153846 0.875 0.93103448 0.86206897]
mean value: 0.8811756861953639
key: train_precision
value: [0.91505792 0.91666667 0.91505792 0.9140625 0.91732283 0.90588235
0.90661479 0.92031873 0.91338583 0.91372549]
mean value: 0.9138095012428894
key: test_recall
value: [0.74074074 0.92592593 0.77777778 0.92592593 0.92592593 0.96428571
0.89285714 1. 0.96428571 0.89285714]
mean value: 0.9010582010582011
key: train_recall
value: [0.95564516 0.93145161 0.95564516 0.94354839 0.93951613 0.93522267
0.94331984 0.93522267 0.93927126 0.94331984]
mean value: 0.9422162726916548
key: test_roc_auc
value: [0.81679894 0.89153439 0.79960317 0.94510582 0.83796296 0.9265873
0.92791005 0.92592593 0.94510582 0.8723545 ]
mean value: 0.8888888888888888
key: train_roc_auc
value: [0.93328817 0.92321568 0.93328817 0.92723978 0.92724794 0.91922424
0.92327282 0.92728876 0.92528079 0.92730508]
mean value: 0.9266651430063993
key: test_jcc
value: [0.66666667 0.80645161 0.65625 0.89285714 0.73529412 0.87096774
0.86206897 0.875 0.9 0.78125 ]
mean value: 0.804680624752682
key: train_jcc
value: [0.87777778 0.85873606 0.87777778 0.86666667 0.866171 0.85239852
0.8597786 0.86516854 0.86245353 0.866171 ]
mean value: 0.8653099481832294
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.25400209 0.25350809 0.25353813 0.25287437 0.24922609 0.24631763
0.25787902 0.25717902 0.25663543 0.25559354]
mean value: 0.25367534160614014
key: score_time
value: [0.00894332 0.00885177 0.00907326 0.00875664 0.00890183 0.00927663
0.00877619 0.00920916 0.00972295 0.00899959]
mean value: 0.009051132202148437
key: test_mcc
value: [0.92962225 0.96428571 0.89139151 0.89139151 0.89153439 0.96423926
0.96423926 0.89602867 0.96423926 0.92724868]
mean value: 0.9284220498029769
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96363636 0.98181818 0.94545455 0.94545455 0.94545455 0.98181818
0.98181818 0.94545455 0.98181818 0.96363636]
mean value: 0.9636363636363636
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96153846 0.98181818 0.94339623 0.94339623 0.94545455 0.98245614
0.98245614 0.94915254 0.98245614 0.96428571]
mean value: 0.9636410319352604
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.96428571 0.96153846 0.96153846 0.92857143 0.96551724
0.96551724 0.90322581 0.96551724 0.96428571]
mean value: 0.9579997310809324
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92592593 1. 0.92592593 0.92592593 0.96296296 1.
1. 1. 1. 0.96428571]
mean value: 0.9705026455026455
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96296296 0.98214286 0.94510582 0.94510582 0.9457672 0.98148148
0.98148148 0.94444444 0.98148148 0.96362434]
mean value: 0.9633597883597884
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.92592593 0.96428571 0.89285714 0.89285714 0.89655172 0.96551724
0.96551724 0.90322581 0.96551724 0.93103448]
mean value: 0.9303289663412022
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.3
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01195788 0.0138123 0.01438642 0.01398134 0.014189 0.01624131
0.01444888 0.0144527 0.01394033 0.0147655 ]
mean value: 0.014217567443847657
key: score_time
value: [0.01118398 0.01099157 0.01105928 0.01112914 0.01111221 0.01204634
0.01111531 0.01175618 0.01196051 0.01118875]
mean value: 0.011354327201843262
key: test_mcc
value: [0.35634832 0.71735629 0.65060574 0.68300095 0.6005291 0.47230166
0.70899471 0.71735629 0.69688314 0.49734925]
mean value: 0.6100725452940903
key: train_mcc
value: [0.68737636 0.74945491 0.78935739 0.77491061 0.8187082 0.79126011
0.79359843 0.78561297 0.78312126 0.78838114]
mean value: 0.7761781373175624
key: test_accuracy
value: [0.65454545 0.85454545 0.81818182 0.83636364 0.8 0.72727273
0.85454545 0.85454545 0.83636364 0.74545455]
mean value: 0.7981818181818182
key: train_accuracy
value: [0.82626263 0.87070707 0.89090909 0.88282828 0.90707071 0.89292929
0.89494949 0.88888889 0.88888889 0.89090909]
mean value: 0.8834343434343435
key: test_fscore
value: [0.71641791 0.86206897 0.79166667 0.81632653 0.8 0.69387755
0.85714286 0.84615385 0.81632653 0.73076923]
mean value: 0.7930750088942501
key: train_fscore
value: [0.85017422 0.86086957 0.88311688 0.87336245 0.90212766 0.88602151
0.8893617 0.88017429 0.88172043 0.88311688]
mean value: 0.8790045582018875
key: test_precision
value: [0.6 0.80645161 0.9047619 0.90909091 0.78571429 0.80952381
0.85714286 0.91666667 0.95238095 0.79166667]
mean value: 0.8333399664851278
key: train_precision
value: [0.74846626 0.93396226 0.95327103 0.95238095 0.95495495 0.94495413
0.93721973 0.95283019 0.94036697 0.94883721]
mean value: 0.9267243687033652
key: test_recall
value: [0.88888889 0.92592593 0.7037037 0.74074074 0.81481481 0.60714286
0.85714286 0.78571429 0.71428571 0.67857143]
mean value: 0.7716931216931217
key: train_recall
value: [0.98387097 0.7983871 0.82258065 0.80645161 0.85483871 0.8340081
0.84615385 0.81781377 0.82995951 0.82591093]
mean value: 0.8419975186104218
key: test_roc_auc
value: [0.65873016 0.85582011 0.81613757 0.83465608 0.80026455 0.72949735
0.85449735 0.85582011 0.83862434 0.74669312]
mean value: 0.799074074074074
key: train_roc_auc
value: [0.82594358 0.87085347 0.89104741 0.88298289 0.90717644 0.8928105
0.89485112 0.88874559 0.88877008 0.89077805]
mean value: 0.8833959122371686
key: test_jcc
value: [0.55813953 0.75757576 0.65517241 0.68965517 0.66666667 0.53125
0.75 0.73333333 0.68965517 0.57575758]
mean value: 0.6607205626837744
key: train_jcc
value: [0.73939394 0.75572519 0.79069767 0.7751938 0.82170543 0.7953668
0.80076628 0.78599222 0.78846154 0.79069767]
mean value: 0.7844000539129116
MCC on Blind test: 0.3
Accuracy on Blind test: 0.74
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02006578 0.03406477 0.0319221 0.02957749 0.03143239 0.0272727
0.02576208 0.02969098 0.03094196 0.0119288 ]
mean value: 0.02726590633392334
key: score_time
value: [0.01922798 0.01971698 0.03047991 0.01091385 0.0175004 0.01788068
0.02109575 0.01848054 0.02028871 0.0110805 ]
mean value: 0.018666529655456544
key: test_mcc
value: [0.63745526 0.78410665 0.56841568 0.89153439 0.78410665 0.8565805
0.85695439 0.89602867 0.89139151 0.74569602]
mean value: 0.7912269728112449
key: train_mcc
value: [0.82706373 0.81437091 0.83515329 0.8265827 0.84258914 0.82682144
0.81851887 0.81438908 0.81873585 0.81457838]
mean value: 0.8238803386905313
key: test_accuracy
value: [0.81818182 0.89090909 0.78181818 0.94545455 0.89090909 0.92727273
0.92727273 0.94545455 0.94545455 0.87272727]
mean value: 0.8945454545454545
key: train_accuracy
value: [0.91313131 0.90707071 0.91717172 0.91313131 0.92121212 0.91313131
0.90909091 0.90707071 0.90909091 0.90707071]
mean value: 0.9117171717171717
key: test_fscore
value: [0.80769231 0.89285714 0.76 0.94545455 0.89285714 0.93103448
0.92592593 0.94915254 0.94736842 0.87719298]
mean value: 0.8929535493427339
key: train_fscore
value: [0.91518738 0.90836653 0.91913215 0.91451292 0.92215569 0.91451292
0.91017964 0.908 0.91053678 0.90836653]
mean value: 0.9130950547952092
key: test_precision
value: [0.84 0.86206897 0.82608696 0.92857143 0.86206897 0.9
0.96153846 0.90322581 0.93103448 0.86206897]
mean value: 0.8876664032393586
key: train_precision
value: [0.8957529 0.8976378 0.8996139 0.90196078 0.91304348 0.8984375
0.8976378 0.8972332 0.89453125 0.89411765]
mean value: 0.8989966247132423
key: test_recall
value: [0.77777778 0.92592593 0.7037037 0.96296296 0.92592593 0.96428571
0.89285714 1. 0.96428571 0.89285714]
mean value: 0.9010582010582011
key: train_recall
value: [0.93548387 0.91935484 0.93951613 0.92741935 0.93145161 0.93117409
0.92307692 0.91902834 0.92712551 0.92307692]
mean value: 0.9276707587828131
key: test_roc_auc
value: [0.81746032 0.89153439 0.78042328 0.9457672 0.89153439 0.9265873
0.92791005 0.94444444 0.94510582 0.8723545 ]
mean value: 0.8943121693121694
key: train_roc_auc
value: [0.91308607 0.90704584 0.91712649 0.91310239 0.92119139 0.91316769
0.90911911 0.90709482 0.90912727 0.90710298]
mean value: 0.9117164032911061
key: test_jcc
value: [0.67741935 0.80645161 0.61290323 0.89655172 0.80645161 0.87096774
0.86206897 0.90322581 0.9 0.78125 ]
mean value: 0.8117290044493882
key: train_jcc
value: [0.84363636 0.83211679 0.85036496 0.84249084 0.85555556 0.84249084
0.83516484 0.83150183 0.83576642 0.83211679]
mean value: 0.840120523434392
MCC on Blind test: 0.25
Accuracy on Blind test: 0.71
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.1858604 0.14565659 0.11904073 0.20053315 0.21871543 0.19877052
0.23351502 0.23374367 0.09664297 0.18243432]
mean value: 0.18149127960205078
key: score_time
value: [0.02734494 0.01139116 0.02203178 0.01164579 0.0204699 0.02237654
0.02139163 0.02069759 0.01105142 0.02118349]
mean value: 0.018958425521850585
key: test_mcc
value: [0.67284827 0.78410665 0.64214885 0.89153439 0.78410665 0.8565805
0.89153439 0.8565805 0.92724868 0.78353876]
mean value: 0.809022763950793
key: train_mcc
value: [0.8393547 0.81437091 0.86751154 0.85478898 0.85067196 0.86365469
0.85916382 0.86308561 0.84716822 0.83883199]
mean value: 0.8498602425342839
key: test_accuracy
value: [0.83636364 0.89090909 0.81818182 0.94545455 0.89090909 0.92727273
0.94545455 0.92727273 0.96363636 0.89090909]
mean value: 0.9036363636363636
key: train_accuracy
value: [0.91919192 0.90707071 0.93333333 0.92727273 0.92525253 0.93131313
0.92929293 0.93131313 0.92323232 0.91919192]
mean value: 0.9246464646464646
key: test_fscore
value: [0.83018868 0.89285714 0.8 0.94545455 0.89285714 0.93103448
0.94545455 0.93103448 0.96428571 0.89655172]
mean value: 0.9029718459809546
key: train_fscore
value: [0.92125984 0.90836653 0.93491124 0.92828685 0.9261477 0.93280632
0.9304175 0.93227092 0.92460317 0.92031873]
mean value: 0.9259388811346168
key: test_precision
value: [0.84615385 0.86206897 0.86956522 0.92857143 0.86206897 0.9
0.96296296 0.9 0.96428571 0.86666667]
mean value: 0.8962343767066405
key: train_precision
value: [0.9 0.8976378 0.91505792 0.91732283 0.91699605 0.91119691
0.9140625 0.91764706 0.90661479 0.90588235]
mean value: 0.910241820136384
key: test_recall
value: [0.81481481 0.92592593 0.74074074 0.96296296 0.92592593 0.96428571
0.92857143 0.96428571 0.96428571 0.92857143]
mean value: 0.912037037037037
key: train_recall
value: [0.94354839 0.91935484 0.95564516 0.93951613 0.93548387 0.95546559
0.94736842 0.94736842 0.94331984 0.93522267]
mean value: 0.942229332636803
key: test_roc_auc
value: [0.83597884 0.89153439 0.81679894 0.9457672 0.89153439 0.9265873
0.9457672 0.9265873 0.96362434 0.89021164]
mean value: 0.9034391534391535
key: train_roc_auc
value: [0.91914261 0.90704584 0.93328817 0.92724794 0.92523181 0.93136183
0.92932937 0.9313455 0.92327282 0.91922424]
mean value: 0.9246490139741413
key: test_jcc
value: [0.70967742 0.80645161 0.66666667 0.89655172 0.80645161 0.87096774
0.89655172 0.87096774 0.93103448 0.8125 ]
mean value: 0.8267820726733407
key: train_jcc
value: [0.8540146 0.83211679 0.87777778 0.866171 0.86245353 0.87407407
0.86988848 0.87313433 0.8597786 0.85239852]
mean value: 0.8621807699995009
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03266501 0.02544355 0.02698326 0.03438282 0.02959514 0.02234364
0.02767587 0.02475405 0.0223968 0.02460122]
mean value: 0.02708413600921631
key: score_time
value: [0.01088929 0.01079488 0.01119494 0.01080465 0.01083374 0.01082158
0.01083565 0.01080179 0.01084828 0.01076126]
mean value: 0.010858607292175294
key: test_mcc
value: [0.86189955 0.76689254 0.75462449 0.9321832 0.75047877 0.89342711
0.85933785 0.82195294 0.71611487 0.82195294]
mean value: 0.8178864271069979
key: train_mcc
value: [0.83842049 0.85032927 0.83456039 0.8314851 0.84698856 0.80724303
0.8154727 0.81912621 0.84698856 0.82718204]
mean value: 0.8317796354275596
key: test_accuracy
value: [0.92982456 0.87719298 0.87719298 0.96491228 0.875 0.94642857
0.92857143 0.91071429 0.85714286 0.91071429]
mean value: 0.9077694235588972
key: train_accuracy
value: [0.91913215 0.92504931 0.91715976 0.91518738 0.92322835 0.90354331
0.90748031 0.90944882 0.92322835 0.91338583]
mean value: 0.9156843560235444
key: test_fscore
value: [0.93103448 0.8852459 0.88135593 0.96428571 0.87719298 0.94545455
0.92592593 0.90909091 0.86206897 0.9122807 ]
mean value: 0.9093936061086217
key: train_fscore
value: [0.92007797 0.92607004 0.91796875 0.91714836 0.9245648 0.90448343
0.90909091 0.91050584 0.9245648 0.91472868]
mean value: 0.9169203576302117
key: test_precision
value: [0.9 0.81818182 0.86666667 1. 0.86206897 0.96296296
0.96153846 0.92592593 0.83333333 0.89655172]
mean value: 0.9027229858264341
key: train_precision
value: [0.91119691 0.91538462 0.90733591 0.89473684 0.90874525 0.8957529
0.89353612 0.9 0.90874525 0.90076336]
mean value: 0.90361971465238
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.92857143
0.89285714 0.89285714 0.89285714 0.92857143]
mean value: 0.918472906403941
key: train_recall
value: [0.92913386 0.93700787 0.92885375 0.94071146 0.94094488 0.91338583
0.92519685 0.92125984 0.94094488 0.92913386]
mean value: 0.9306573091407052
key: test_roc_auc
value: [0.93041872 0.87869458 0.87684729 0.96551724 0.875 0.94642857
0.92857143 0.91071429 0.85714286 0.91071429]
mean value: 0.9080049261083745
key: train_roc_auc
value: [0.91911238 0.92502568 0.91718278 0.91523762 0.92322835 0.90354331
0.90748031 0.90944882 0.92322835 0.91338583]
mean value: 0.9156873424418785
key: test_jcc
value: [0.87096774 0.79411765 0.78787879 0.93103448 0.78125 0.89655172
0.86206897 0.83333333 0.75757576 0.83870968]
mean value: 0.8353488117615334
key: train_jcc
value: [0.85198556 0.86231884 0.84837545 0.84697509 0.85971223 0.82562278
0.83333333 0.83571429 0.85971223 0.84285714]
mean value: 0.8466606938515135
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.76443863 0.82875395 0.6898253 0.68586373 0.77263117 0.67486453
0.67990541 0.83567691 0.67201447 0.71364689]
mean value: 0.7317620992660523
key: score_time
value: [0.01203632 0.01211286 0.0111289 0.01215601 0.01238728 0.02110219
0.01246333 0.01241684 0.01234293 0.01244378]
mean value: 0.013059043884277343
key: test_mcc
value: [0.85960591 0.86189955 0.85960591 0.9321832 0.85933785 0.96490128
0.96490128 0.85714286 0.82195294 0.89342711]
mean value: 0.8874957891973448
key: train_mcc
value: [0.94480322 0.92902382 0.93691156 0.93691156 0.95287407 0.93712408
0.94491118 0.94095217 0.93703692 0.93703692]
mean value: 0.9397585529859461
key: test_accuracy
value: [0.92982456 0.92982456 0.92982456 0.96491228 0.92857143 0.98214286
0.98214286 0.92857143 0.91071429 0.94642857]
mean value: 0.943295739348371
key: train_accuracy
value: [0.97238659 0.96449704 0.96844181 0.96844181 0.97637795 0.96850394
0.97244094 0.97047244 0.96850394 0.96850394]
mean value: 0.9698570407988942
key: test_fscore
value: [0.92857143 0.93103448 0.93103448 0.96428571 0.93103448 0.98245614
0.98181818 0.92857143 0.9122807 0.94545455]
mean value: 0.9436541589082423
key: train_fscore
value: [0.97233202 0.96442688 0.96825397 0.96825397 0.97619048 0.96825397
0.97233202 0.9704142 0.96837945 0.96837945]
mean value: 0.9697216384507354
key: test_precision
value: [0.92857143 0.9 0.93103448 1. 0.9 0.96551724
1. 0.92857143 0.89655172 0.96296296]
mean value: 0.9413209268381683
key: train_precision
value: [0.97619048 0.96825397 0.97211155 0.97211155 0.984 0.976
0.97619048 0.97233202 0.97222222 0.97222222]
mean value: 0.9741634488459363
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.96428571 1.
0.96428571 0.92857143 0.92857143 0.92857143]
mean value: 0.9469211822660099
key: train_recall
value: [0.96850394 0.96062992 0.96442688 0.96442688 0.96850394 0.96062992
0.96850394 0.96850394 0.96456693 0.96456693]
mean value: 0.9653263203759609
key: test_roc_auc
value: [0.92980296 0.93041872 0.92980296 0.96551724 0.92857143 0.98214286
0.98214286 0.92857143 0.91071429 0.94642857]
mean value: 0.9434113300492611
key: train_roc_auc
value: [0.97239426 0.96450468 0.96843391 0.96843391 0.97637795 0.96850394
0.97244094 0.97047244 0.96850394 0.96850394]
mean value: 0.9698569916902681
key: test_jcc
value: [0.86666667 0.87096774 0.87096774 0.93103448 0.87096774 0.96551724
0.96428571 0.86666667 0.83870968 0.89655172]
mean value: 0.8942335399120717
key: train_jcc
value: [0.94615385 0.93129771 0.93846154 0.93846154 0.95348837 0.93846154
0.94615385 0.94252874 0.93869732 0.93869732]
mean value: 0.9412401761356505
MCC on Blind test: 0.2
Accuracy on Blind test: 0.59
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01082301 0.01014185 0.00848889 0.00828099 0.00836778 0.00830722
0.00840569 0.00863934 0.00839567 0.00852156]
mean value: 0.008837199211120606
key: score_time
value: [0.01088572 0.00903606 0.0087676 0.00838327 0.0087471 0.00870657
0.00868726 0.00890255 0.00878382 0.00867343]
mean value: 0.008957338333129884
key: test_mcc
value: [0.50927421 0.65018988 0.7366424 0.64889453 0.65814518 0.58501794
0.80439967 0.61706091 0.64951905 0.61706091]
mean value: 0.6476204667782233
key: train_mcc
value: [0.67420459 0.6683308 0.66925612 0.67734922 0.69555499 0.6527166
0.65044798 0.70356186 0.67461719 0.68157216]
mean value: 0.6747611503976669
key: test_accuracy
value: [0.75438596 0.8245614 0.85964912 0.80701754 0.82142857 0.78571429
0.89285714 0.80357143 0.82142857 0.80357143]
mean value: 0.8174185463659148
key: train_accuracy
value: [0.82840237 0.82642998 0.82642998 0.83234714 0.84251969 0.81889764
0.81692913 0.8484252 0.83070866 0.83464567]
mean value: 0.830573545170759
key: test_fscore
value: [0.74074074 0.81481481 0.84615385 0.7755102 0.8 0.76
0.88 0.78431373 0.80769231 0.78431373]
mean value: 0.7993539364463734
key: train_fscore
value: [0.80709534 0.8061674 0.80444444 0.81400438 0.82758621 0.79735683
0.79379157 0.8372093 0.81222707 0.8173913 ]
mean value: 0.8117273855652805
key: test_precision
value: [0.76923077 0.84615385 0.95652174 0.95 0.90909091 0.86363636
1. 0.86956522 0.875 0.86956522]
mean value: 0.8908764062024932
key: train_precision
value: [0.92385787 0.915 0.91878173 0.91176471 0.91428571 0.905
0.90862944 0.90410959 0.91176471 0.91262136]
mean value: 0.9125815109847812
key: test_recall
value: [0.71428571 0.78571429 0.75862069 0.65517241 0.71428571 0.67857143
0.78571429 0.71428571 0.75 0.71428571]
mean value: 0.7270935960591133
key: train_recall
value: [0.71653543 0.72047244 0.71541502 0.73517787 0.75590551 0.71259843
0.70472441 0.77952756 0.73228346 0.74015748]
mean value: 0.7312797609784942
key: test_roc_auc
value: [0.75369458 0.82389163 0.8614532 0.80972906 0.82142857 0.78571429
0.89285714 0.80357143 0.82142857 0.80357143]
mean value: 0.8177339901477833
key: train_roc_auc
value: [0.82862345 0.82663938 0.82621145 0.83215586 0.84251969 0.81889764
0.81692913 0.8484252 0.83070866 0.83464567]
mean value: 0.8305756123369954
key: test_jcc
value: [0.58823529 0.6875 0.73333333 0.63333333 0.66666667 0.61290323
0.78571429 0.64516129 0.67741935 0.64516129]
mean value: 0.6675428074455588
key: train_jcc
value: [0.67657993 0.67527675 0.67286245 0.68634686 0.70588235 0.66300366
0.65808824 0.72 0.68382353 0.69117647]
mean value: 0.6833040246657276
MCC on Blind test: 0.34
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00908518 0.00873065 0.00850201 0.00850201 0.00844526 0.00834846
0.00829315 0.00879455 0.00849795 0.00861597]
mean value: 0.00858151912689209
key: score_time
value: [0.00892878 0.0087173 0.00866246 0.00867057 0.00865889 0.00869465
0.0087862 0.00887084 0.00837755 0.00877047]
mean value: 0.008713769912719726
key: test_mcc
value: [0.79778885 0.72706729 0.79110556 0.66755025 0.71611487 0.78772636
0.79385662 0.75047877 0.67900461 0.75047877]
mean value: 0.7461171974035183
key: train_mcc
value: [0.77122271 0.76334013 0.76731664 0.68276748 0.78361641 0.76800824
0.76819892 0.77588525 0.78361641 0.77574087]
mean value: 0.763971305717051
key: test_accuracy
value: [0.89473684 0.85964912 0.89473684 0.80701754 0.85714286 0.89285714
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.868828320802005
key: train_accuracy
value: [0.88560158 0.8816568 0.88362919 0.84023669 0.89173228 0.88385827
0.88385827 0.88779528 0.89173228 0.88779528]
mean value: 0.8817895913898336
key: test_fscore
value: [0.9 0.86666667 0.9 0.76595745 0.86206897 0.88888889
0.88461538 0.87272727 0.84210526 0.87719298]
mean value: 0.8660222870838
key: train_fscore
value: [0.88627451 0.88142292 0.88408644 0.83298969 0.89278752 0.88543689
0.88588008 0.88932039 0.89278752 0.88888889]
mean value: 0.8819874865979285
key: test_precision
value: [0.84375 0.8125 0.87096774 1. 0.83333333 0.92307692
0.95833333 0.88888889 0.82758621 0.86206897]
mean value: 0.8820505392981756
key: train_precision
value: [0.8828125 0.88492063 0.87890625 0.87068966 0.88416988 0.87356322
0.87072243 0.87739464 0.88416988 0.88030888]
mean value: 0.8787657976607903
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.62068966 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8623152709359606
key: train_recall
value: [0.88976378 0.87795276 0.88932806 0.79841897 0.9015748 0.8976378
0.9015748 0.9015748 0.9015748 0.8976378 ]
mean value: 0.8857038374155799
key: test_roc_auc
value: [0.89593596 0.86083744 0.89408867 0.81034483 0.85714286 0.89285714
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.8693349753694581
key: train_roc_auc
value: [0.88559335 0.88166412 0.88364041 0.84015437 0.89173228 0.88385827
0.88385827 0.88779528 0.89173228 0.88779528]
mean value: 0.881782390837509
key: test_jcc
value: [0.81818182 0.76470588 0.81818182 0.62068966 0.75757576 0.8
0.79310345 0.77419355 0.72727273 0.78125 ]
mean value: 0.7655154655400436
key: train_jcc
value: [0.79577465 0.78798587 0.79225352 0.71378092 0.80633803 0.79442509
0.79513889 0.8006993 0.80633803 0.8 ]
mean value: 0.7892734286500613
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.0078218 0.00819182 0.00788093 0.00713396 0.00716805 0.00795126
0.00721955 0.00793576 0.00724411 0.00745344]
mean value: 0.007600069046020508
key: score_time
value: [0.01292777 0.01269674 0.01303506 0.01127744 0.01387811 0.01285839
0.01185989 0.0118773 0.0108037 0.01251173]
mean value: 0.012372612953186035
key: test_mcc
value: [0.72706729 0.68850906 0.71921182 0.8953202 0.71611487 0.68250015
0.79385662 0.75047877 0.67900461 0.75047877]
mean value: 0.7402542168196266
key: train_mcc
value: [0.79496359 0.79887642 0.78334713 0.77932046 0.79951627 0.76777009
0.79936749 0.79163927 0.80324922 0.79530025]
mean value: 0.7913350175140176
key: test_accuracy
value: [0.85964912 0.84210526 0.85964912 0.94736842 0.85714286 0.83928571
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.868734335839599
key: train_accuracy
value: [0.8974359 0.89940828 0.89151874 0.88954635 0.8996063 0.88385827
0.8996063 0.89566929 0.9015748 0.8976378 ]
mean value: 0.8955862026122474
key: test_fscore
value: [0.86666667 0.84745763 0.86206897 0.94736842 0.86206897 0.83018868
0.88461538 0.87272727 0.84210526 0.87719298]
mean value: 0.86924602280744
key: train_fscore
value: [0.8984375 0.8990099 0.89278752 0.890625 0.90097087 0.88454012
0.9005848 0.89708738 0.90234375 0.89803922]
mean value: 0.8964426056208497
key: test_precision
value: [0.8125 0.80645161 0.86206897 0.96428571 0.83333333 0.88
0.95833333 0.88888889 0.82758621 0.86206897]
mean value: 0.869551702067553
key: train_precision
value: [0.89147287 0.90438247 0.88076923 0.88030888 0.88888889 0.87937743
0.89189189 0.88505747 0.89534884 0.89453125]
mean value: 0.8892029220575753
key: test_recall
value: [0.92857143 0.89285714 0.86206897 0.93103448 0.89285714 0.78571429
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8721674876847291
key: train_recall
value: [0.90551181 0.89370079 0.90513834 0.90118577 0.91338583 0.88976378
0.90944882 0.90944882 0.90944882 0.9015748 ]
mean value: 0.9038607575238866
key: test_roc_auc
value: [0.86083744 0.8429803 0.85960591 0.9476601 0.85714286 0.83928571
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.8689655172413794
key: train_roc_auc
value: [0.89741994 0.89941956 0.89154555 0.88956926 0.8996063 0.88385827
0.8996063 0.89566929 0.9015748 0.8976378 ]
mean value: 0.8955907067940618
key: test_jcc
value: [0.76470588 0.73529412 0.75757576 0.9 0.75757576 0.70967742
0.79310345 0.77419355 0.72727273 0.78125 ]
mean value: 0.770064865844204
key: train_jcc
value: [0.81560284 0.81654676 0.80633803 0.8028169 0.81978799 0.79298246
0.81914894 0.81338028 0.82206406 0.81494662]
mean value: 0.8123614865069838
MCC on Blind test: 0.25
Accuracy on Blind test: 0.72
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.01474333 0.01426578 0.01444244 0.01453424 0.01460671 0.01486588
0.01472378 0.01466084 0.01448703 0.01444364]
mean value: 0.014577364921569825
key: score_time
value: [0.00919628 0.00896859 0.00908899 0.00891948 0.00902748 0.00907803
0.00912905 0.00929427 0.00912786 0.00900006]
mean value: 0.009083008766174317
key: test_mcc
value: [0.82942474 0.76689254 0.79110556 0.89988258 0.71611487 0.78772636
0.79385662 0.78772636 0.67900461 0.71428571]
mean value: 0.776601995146589
key: train_mcc
value: [0.78308641 0.79093074 0.78708603 0.77160078 0.79537422 0.77974514
0.78395685 0.78779242 0.79537422 0.78351922]
mean value: 0.7858466034660538
key: test_accuracy
value: [0.9122807 0.87719298 0.89473684 0.94736842 0.85714286 0.89285714
0.89285714 0.89285714 0.83928571 0.85714286]
mean value: 0.8863721804511278
key: train_accuracy
value: [0.89151874 0.89546351 0.89349112 0.88560158 0.8976378 0.88976378
0.89173228 0.89370079 0.8976378 0.89173228]
mean value: 0.8928279675099784
key: test_fscore
value: [0.91525424 0.8852459 0.9 0.94545455 0.86206897 0.88888889
0.88461538 0.88888889 0.84210526 0.85714286]
mean value: 0.8869664932593181
key: train_fscore
value: [0.89236791 0.89587426 0.89411765 0.88715953 0.8984375 0.89105058
0.89361702 0.89534884 0.8984375 0.89236791]
mean value: 0.8938778697670609
key: test_precision
value: [0.87096774 0.81818182 0.87096774 1. 0.83333333 0.92307692
0.95833333 0.92307692 0.82758621 0.85714286]
mean value: 0.8882666878912708
key: train_precision
value: [0.88715953 0.89411765 0.88715953 0.87356322 0.89147287 0.88076923
0.878327 0.88167939 0.89147287 0.88715953]
mean value: 0.8852880817385453
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.85714286]
mean value: 0.8899014778325123
key: train_recall
value: [0.8976378 0.8976378 0.90118577 0.90118577 0.90551181 0.9015748
0.90944882 0.90944882 0.90551181 0.8976378 ]
mean value: 0.9026780990320874
key: test_roc_auc
value: [0.91317734 0.87869458 0.89408867 0.94827586 0.85714286 0.89285714
0.89285714 0.89285714 0.83928571 0.85714286]
mean value: 0.8866379310344827
key: train_roc_auc
value: [0.89150664 0.89545921 0.89350627 0.88563226 0.8976378 0.88976378
0.89173228 0.89370079 0.8976378 0.89173228]
mean value: 0.8928309109582646
key: test_jcc
value: [0.84375 0.79411765 0.81818182 0.89655172 0.75757576 0.8
0.79310345 0.8 0.72727273 0.75 ]
mean value: 0.798055312250292
key: train_jcc
value: [0.80565371 0.8113879 0.80851064 0.7972028 0.81560284 0.80350877
0.80769231 0.81052632 0.81560284 0.80565371]
mean value: 0.8081341825521712
MCC on Blind test: 0.22
Accuracy on Blind test: 0.71
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.39773488 1.62088108 1.47775269 1.50360155 1.53373218 1.47553325
1.55788469 1.53873181 1.53706622 1.49211836]
mean value: 1.5135036706924438
key: score_time
value: [0.01128125 0.01326585 0.01378894 0.01334476 0.01340508 0.01640296
0.01380563 0.01374269 0.0163722 0.01363611]
mean value: 0.013904547691345215
key: test_mcc
value: [0.8951918 0.86189955 0.82880708 0.82490815 0.78772636 0.85933785
0.96490128 0.85714286 0.82195294 0.85714286]
mean value: 0.8559010729919259
key: train_mcc
value: [0.96067294 0.96450468 0.96847232 0.96844169 0.9645744 0.9645744
0.9645744 0.97244848 0.9606597 0.98032256]
mean value: 0.9669245592847295
key: test_accuracy
value: [0.94736842 0.92982456 0.9122807 0.9122807 0.89285714 0.92857143
0.98214286 0.92857143 0.91071429 0.92857143]
mean value: 0.9273182957393483
key: train_accuracy
value: [0.98027613 0.98224852 0.98422091 0.98422091 0.98228346 0.98228346
0.98228346 0.98622047 0.98031496 0.99015748]
mean value: 0.9834509776514623
key: test_fscore
value: [0.94545455 0.93103448 0.91803279 0.91525424 0.89655172 0.92592593
0.98181818 0.92857143 0.9122807 0.92857143]
mean value: 0.928349544316583
key: train_fscore
value: [0.98015873 0.98224852 0.98425197 0.98418972 0.98224852 0.98224852
0.98224852 0.98619329 0.98023715 0.99017682]
mean value: 0.9834201770147663
key: test_precision
value: [0.96296296 0.9 0.875 0.9 0.86666667 0.96153846
1. 0.92857143 0.89655172 0.92857143]
mean value: 0.921986267244888
key: train_precision
value: [0.988 0.98418972 0.98039216 0.98418972 0.98418972 0.98418972
0.98418972 0.98814229 0.98412698 0.98823529]
mean value: 0.9849845344198285
key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 0.89285714
0.96428571 0.92857143 0.92857143 0.92857143]
mean value: 0.9360837438423646
key: train_recall
value: [0.97244094 0.98031496 0.98814229 0.98418972 0.98031496 0.98031496
0.98031496 0.98425197 0.97637795 0.99212598]
mean value: 0.9818788708723662
key: test_roc_auc
value: [0.94704433 0.93041872 0.91133005 0.91194581 0.89285714 0.92857143
0.98214286 0.92857143 0.91071429 0.92857143]
mean value: 0.9272167487684729
key: train_roc_auc
value: [0.98029162 0.98225234 0.98422863 0.98422085 0.98228346 0.98228346
0.98228346 0.98622047 0.98031496 0.99015748]
mean value: 0.9834536740219726
key: test_jcc
value: [0.89655172 0.87096774 0.84848485 0.84375 0.8125 0.86206897
0.96428571 0.86666667 0.83870968 0.86666667]
mean value: 0.8670652005113907
key: train_jcc
value: [0.96108949 0.96511628 0.96899225 0.9688716 0.96511628 0.96511628
0.96511628 0.97276265 0.96124031 0.98054475]
mean value: 0.9673966156908878
MCC on Blind test: 0.26
Accuracy on Blind test: 0.66
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.0136857 0.01279187 0.01128888 0.01076555 0.01062655 0.01041436
0.01054716 0.01079988 0.0110817 0.01173425]
mean value: 0.011373591423034669
key: score_time
value: [0.01080513 0.00837135 0.00845194 0.00823951 0.0084095 0.0082202
0.00809073 0.00809741 0.00841331 0.0083375 ]
mean value: 0.008543658256530761
key: test_mcc
value: [0.92980296 0.8953202 0.82942474 0.96551724 0.75047877 0.89342711
0.89342711 0.85933785 0.96490128 0.92857143]
mean value: 0.891020869070053
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.94736842 0.9122807 0.98245614 0.875 0.94642857
0.94642857 0.92857143 0.98214286 0.96428571]
mean value: 0.9449874686716792
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.94736842 0.90909091 0.98245614 0.87719298 0.94736842
0.94545455 0.92592593 0.98181818 0.96428571]
mean value: 0.9445246955773272
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.93103448 0.96153846 1. 0.86206897 0.93103448
0.96296296 0.96153846 1. 0.96428571]
mean value: 0.9538749245645797
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.86206897 0.96551724 0.89285714 0.96428571
0.92857143 0.89285714 0.96428571 0.96428571]
mean value: 0.9363300492610838
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.9476601 0.91317734 0.98275862 0.875 0.94642857
0.94642857 0.92857143 0.98214286 0.96428571]
mean value: 0.9451354679802957
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.9 0.83333333 0.96551724 0.78125 0.9
0.89655172 0.86206897 0.96428571 0.93103448]
mean value: 0.8965075944170772
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.36
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.10608721 0.10524035 0.10447693 0.1020844 0.10217381 0.10169768
0.10213351 0.1026423 0.10438013 0.10096812]
mean value: 0.10318844318389893
key: score_time
value: [0.01834702 0.01700187 0.01766229 0.0172255 0.01845121 0.0170753
0.0172298 0.01727653 0.01692057 0.01812077]
mean value: 0.01753108501434326
key: test_mcc
value: [0.82942474 0.86189955 0.8615634 0.8953202 0.78772636 0.93094934
0.89802651 0.78772636 0.75047877 0.85933785]
mean value: 0.84624530759693
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9122807 0.92982456 0.92982456 0.94736842 0.89285714 0.96428571
0.94642857 0.89285714 0.875 0.92857143]
mean value: 0.9219298245614035
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91525424 0.93103448 0.93333333 0.94736842 0.89655172 0.96296296
0.94339623 0.88888889 0.87719298 0.93103448]
mean value: 0.9227017742052359
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.87096774 0.9 0.90322581 0.96428571 0.86666667 1.
1. 0.92307692 0.86206897 0.9 ]
mean value: 0.9190291817933642
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.93103448 0.92857143 0.92857143
0.89285714 0.85714286 0.89285714 0.96428571]
mean value: 0.9289408866995074
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91317734 0.93041872 0.92918719 0.9476601 0.89285714 0.96428571
0.94642857 0.89285714 0.875 0.92857143]
mean value: 0.9220443349753695
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84375 0.87096774 0.875 0.9 0.8125 0.92857143
0.89285714 0.8 0.78125 0.87096774]
mean value: 0.8575864055299539
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.71
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00839448 0.00779438 0.00789499 0.00782299 0.00828099 0.00764513
0.0078218 0.00805545 0.00872326 0.00792527]
mean value: 0.008035874366760254
key: score_time
value: [0.0083375 0.00801182 0.00785446 0.0080626 0.0083189 0.00805974
0.00803876 0.00792432 0.00809288 0.00801706]
mean value: 0.008071804046630859
key: test_mcc
value: [0.79161589 0.68850906 0.72133224 0.54592083 0.4645821 0.61065803
0.79385662 0.68250015 0.64285714 0.62705445]
mean value: 0.65688865057222
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89473684 0.84210526 0.85964912 0.77192982 0.73214286 0.80357143
0.89285714 0.83928571 0.82142857 0.80357143]
mean value: 0.8261278195488722
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.89655172 0.84745763 0.85714286 0.78688525 0.72727273 0.79245283
0.88461538 0.83018868 0.82142857 0.7755102 ]
mean value: 0.821950585113335
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86666667 0.80645161 0.88888889 0.75 0.74074074 0.84
0.95833333 0.88 0.82142857 0.9047619 ]
mean value: 0.8457271718723331
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.92857143 0.89285714 0.82758621 0.82758621 0.71428571 0.75
0.82142857 0.78571429 0.82142857 0.67857143]
mean value: 0.8048029556650247
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8953202 0.8429803 0.86022167 0.77093596 0.73214286 0.80357143
0.89285714 0.83928571 0.82142857 0.80357143]
mean value: 0.826231527093596
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8125 0.73529412 0.75 0.64864865 0.57142857 0.65625
0.79310345 0.70967742 0.6969697 0.63333333]
mean value: 0.7007205235658011
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.23
Accuracy on Blind test: 0.71
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.3137598 1.31105161 1.31397271 1.34022093 1.33495617 1.32320976
1.32298803 1.31964326 1.33588552 1.33314967]
mean value: 1.3248837471008301
key: score_time
value: [0.09023738 0.0960989 0.0929544 0.09687686 0.09749436 0.092448
0.09450769 0.09734035 0.09717226 0.09090662]
mean value: 0.09460368156433105
key: test_mcc
value: [0.92980296 0.92980296 0.8951918 0.9321832 0.85933785 0.96490128
0.96490128 0.92857143 0.89342711 0.89342711]
mean value: 0.9191546978182543
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.96491228 0.94736842 0.96491228 0.92857143 0.98214286
0.98214286 0.96428571 0.94642857 0.94642857]
mean value: 0.9592105263157895
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.96428571 0.94915254 0.96428571 0.93103448 0.98245614
0.98181818 0.96428571 0.94545455 0.94736842]
mean value: 0.9594427170950596
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.96428571 0.93333333 1. 0.9 0.96551724
1. 0.96428571 0.96296296 0.93103448]
mean value: 0.958570516329137
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.93103448 0.96428571 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.9610837438423645
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.96490148 0.94704433 0.96551724 0.92857143 0.98214286
0.98214286 0.96428571 0.94642857 0.94642857]
mean value: 0.9592364532019705
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.93103448 0.90322581 0.93103448 0.87096774 0.96551724
0.96428571 0.93103448 0.89655172 0.9 ]
mean value: 0.9224686159224535
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.2
Accuracy on Blind test: 0.5
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.87169266 0.93242407 0.9023416 1.00182915 0.90367198 0.91859269
0.90991735 0.90380979 0.88591456 0.89217138]
mean value: 0.9122365236282348
key: score_time
value: [0.22929215 0.26355243 0.24770474 0.25745416 0.18990588 0.25744367
0.27588391 0.26097345 0.2340591 0.21919918]
mean value: 0.24354686737060546
key: test_mcc
value: [0.8953202 0.92980296 0.8951918 0.9321832 0.85933785 0.96490128
0.96490128 0.96490128 0.89342711 0.89342711]
mean value: 0.919339407234444
key: train_mcc
value: [0.95679178 0.94890036 0.94878539 0.94089544 0.9606597 0.94900279
0.94112724 0.94499908 0.94888508 0.95687833]
mean value: 0.9496925191019135
key: test_accuracy
value: [0.94736842 0.96491228 0.94736842 0.96491228 0.92857143 0.98214286
0.98214286 0.98214286 0.94642857 0.94642857]
mean value: 0.9592418546365915
key: train_accuracy
value: [0.97830375 0.97435897 0.97435897 0.9704142 0.98031496 0.97440945
0.97047244 0.97244094 0.97440945 0.97834646]
mean value: 0.9747829598223299
key: test_fscore
value: [0.94736842 0.96428571 0.94915254 0.96428571 0.93103448 0.98245614
0.98181818 0.98181818 0.94545455 0.94736842]
mean value: 0.959504234524998
key: train_fscore
value: [0.9785575 0.97465887 0.97445972 0.97053045 0.98039216 0.97465887
0.97076023 0.97265625 0.97455969 0.9785575 ]
mean value: 0.9749791253024628
key: test_precision
value: [0.93103448 0.96428571 0.93333333 1. 0.9 0.96551724
1. 1. 0.96296296 0.93103448]
mean value: 0.9588168217478562
key: train_precision
value: [0.96911197 0.96525097 0.96875 0.96484375 0.9765625 0.96525097
0.96138996 0.96511628 0.9688716 0.96911197]
mean value: 0.9674259954516337
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.93103448 0.96428571 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.9610837438423645
key: train_recall
value: [0.98818898 0.98425197 0.98023715 0.97628458 0.98425197 0.98425197
0.98031496 0.98031496 0.98031496 0.98818898]
mean value: 0.9826600479287916
key: test_roc_auc
value: [0.9476601 0.96490148 0.94704433 0.96551724 0.92857143 0.98214286
0.98214286 0.98214286 0.94642857 0.94642857]
mean value: 0.9592980295566503
key: train_roc_auc
value: [0.97828421 0.97433942 0.97437055 0.97042576 0.98031496 0.97440945
0.97047244 0.97244094 0.97440945 0.97834646]
mean value: 0.9747813637919767
key: test_jcc
value: [0.9 0.93103448 0.90322581 0.93103448 0.87096774 0.96551724
0.96428571 0.96428571 0.89655172 0.9 ]
mean value: 0.9226902907993009
key: train_jcc
value: [0.95801527 0.95057034 0.95019157 0.94274809 0.96153846 0.95057034
0.94318182 0.94676806 0.95038168 0.95801527]
mean value: 0.9511980901192165
MCC on Blind test: 0.2
Accuracy on Blind test: 0.5
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02086926 0.00807834 0.00846243 0.00790358 0.00789189 0.0084095
0.00794506 0.00821877 0.00814724 0.00783157]
mean value: 0.009375762939453126
key: score_time
value: [0.01121163 0.00862956 0.00874233 0.00838256 0.00837135 0.00809836
0.00858903 0.00843906 0.00876284 0.00867867]
mean value: 0.008790540695190429
key: test_mcc
value: [0.79778885 0.72706729 0.79110556 0.66755025 0.71611487 0.78772636
0.79385662 0.75047877 0.67900461 0.75047877]
mean value: 0.7461171974035183
key: train_mcc
value: [0.77122271 0.76334013 0.76731664 0.68276748 0.78361641 0.76800824
0.76819892 0.77588525 0.78361641 0.77574087]
mean value: 0.763971305717051
key: test_accuracy
value: [0.89473684 0.85964912 0.89473684 0.80701754 0.85714286 0.89285714
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.868828320802005
key: train_accuracy
value: [0.88560158 0.8816568 0.88362919 0.84023669 0.89173228 0.88385827
0.88385827 0.88779528 0.89173228 0.88779528]
mean value: 0.8817895913898336
key: test_fscore
value: [0.9 0.86666667 0.9 0.76595745 0.86206897 0.88888889
0.88461538 0.87272727 0.84210526 0.87719298]
mean value: 0.8660222870838
key: train_fscore
value: [0.88627451 0.88142292 0.88408644 0.83298969 0.89278752 0.88543689
0.88588008 0.88932039 0.89278752 0.88888889]
mean value: 0.8819874865979285
key: test_precision
value: [0.84375 0.8125 0.87096774 1. 0.83333333 0.92307692
0.95833333 0.88888889 0.82758621 0.86206897]
mean value: 0.8820505392981756
key: train_precision
value: [0.8828125 0.88492063 0.87890625 0.87068966 0.88416988 0.87356322
0.87072243 0.87739464 0.88416988 0.88030888]
mean value: 0.8787657976607903
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.62068966 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8623152709359606
key: train_recall
value: [0.88976378 0.87795276 0.88932806 0.79841897 0.9015748 0.8976378
0.9015748 0.9015748 0.9015748 0.8976378 ]
mean value: 0.8857038374155799
key: test_roc_auc
value: [0.89593596 0.86083744 0.89408867 0.81034483 0.85714286 0.89285714
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.8693349753694581
key: train_roc_auc
value: [0.88559335 0.88166412 0.88364041 0.84015437 0.89173228 0.88385827
0.88385827 0.88779528 0.89173228 0.88779528]
mean value: 0.881782390837509
key: test_jcc
value: [0.81818182 0.76470588 0.81818182 0.62068966 0.75757576 0.8
0.79310345 0.77419355 0.72727273 0.78125 ]
mean value: 0.7655154655400436
key: train_jcc
value: [0.79577465 0.78798587 0.79225352 0.71378092 0.80633803 0.79442509
0.79513889 0.8006993 0.80633803 0.8 ]
mean value: 0.7892734286500613
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.06513762 0.05543566 0.05926275 0.05869985 0.05539632 0.05809283
0.05878782 0.06239796 0.0591898 0.21499252]
mean value: 0.07473931312561036
key: score_time
value: [0.01001787 0.00966692 0.00963521 0.00965786 0.0098114 0.0097878
0.00974226 0.00981474 0.00976157 0.01011968]
mean value: 0.009801530838012695
key: test_mcc
value: [0.92980296 0.92980296 0.92980296 0.96551724 0.82618439 0.93094934
1. 0.92857143 0.96490128 0.89342711]
mean value: 0.9298959656239084
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.96491228 0.96491228 0.98245614 0.91071429 0.96428571
1. 0.96428571 0.98214286 0.94642857]
mean value: 0.9645050125313284
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.96428571 0.96551724 0.98245614 0.91525424 0.96551724
1. 0.96428571 0.98181818 0.94736842]
mean value: 0.965078860612559
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.96428571 0.96551724 1. 0.87096774 0.93333333
1. 0.96428571 1. 0.93103448]
mean value: 0.9593709942263892
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 0.96551724 0.96428571 1.
1. 0.96428571 0.96428571 0.96428571]
mean value: 0.9716748768472907
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.96490148 0.96490148 0.98275862 0.91071429 0.96428571
1. 0.96428571 0.98214286 0.94642857]
mean value: 0.9645320197044336
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.93103448 0.93333333 0.96551724 0.84375 0.93333333
1. 0.93103448 0.96428571 0.9 ]
mean value: 0.9333323070607553
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.37
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0158906 0.04133749 0.04191256 0.04180241 0.04169393 0.04146218
0.04147553 0.03954792 0.04155207 0.04123378]
mean value: 0.03879084587097168
key: score_time
value: [0.010324 0.01102901 0.0110786 0.01924562 0.02003503 0.02148247
0.01090479 0.02042723 0.02186942 0.01970553]
mean value: 0.016610169410705568
key: test_mcc
value: [0.82512315 0.76689254 0.79110556 0.9321832 0.75434227 0.82195294
0.89802651 0.85933785 0.67900461 0.82195294]
mean value: 0.8149921569819407
key: train_mcc
value: [0.87014673 0.87419439 0.85823465 0.85931426 0.87499279 0.85486752
0.83910959 0.86274648 0.87089581 0.85105352]
mean value: 0.8615555753216068
key: test_accuracy
value: [0.9122807 0.87719298 0.89473684 0.96491228 0.875 0.91071429
0.94642857 0.92857143 0.83928571 0.91071429]
mean value: 0.905983709273183
key: train_accuracy
value: [0.93491124 0.93688363 0.92899408 0.92899408 0.93700787 0.92716535
0.91929134 0.93110236 0.93503937 0.92519685]
mean value: 0.9304586187081645
key: test_fscore
value: [0.9122807 0.8852459 0.9 0.96428571 0.88135593 0.90909091
0.94339623 0.92592593 0.84210526 0.90909091]
mean value: 0.9072777483563568
key: train_fscore
value: [0.93592233 0.9379845 0.9296875 0.93076923 0.93846154 0.92843327
0.92069632 0.93230174 0.93641618 0.92664093]
mean value: 0.9317313541686736
key: test_precision
value: [0.89655172 0.81818182 0.87096774 1. 0.83870968 0.92592593
1. 0.96153846 0.82758621 0.92592593]
mean value: 0.9065387481961453
key: train_precision
value: [0.92337165 0.92366412 0.91891892 0.90636704 0.91729323 0.91254753
0.90494297 0.91634981 0.91698113 0.90909091]
mean value: 0.9149527308196
key: test_recall
value: [0.92857143 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
0.89285714 0.89285714 0.85714286 0.89285714]
mean value: 0.9112068965517242
key: train_recall
value: [0.9488189 0.95275591 0.94071146 0.95652174 0.96062992 0.94488189
0.93700787 0.9488189 0.95669291 0.94488189]
mean value: 0.9491721390557406
key: test_roc_auc
value: [0.91256158 0.87869458 0.89408867 0.96551724 0.875 0.91071429
0.94642857 0.92857143 0.83928571 0.91071429]
mean value: 0.9061576354679803
key: train_roc_auc
value: [0.93488376 0.93685226 0.92901715 0.92904827 0.93700787 0.92716535
0.91929134 0.93110236 0.93503937 0.92519685]
mean value: 0.9304604587470044
key: test_jcc
value: [0.83870968 0.79411765 0.81818182 0.93103448 0.78787879 0.83333333
0.89285714 0.86206897 0.72727273 0.83333333]
mean value: 0.8318787915611183
key: train_jcc
value: [0.87956204 0.88321168 0.86861314 0.8705036 0.88405797 0.86642599
0.85304659 0.87318841 0.88043478 0.86330935]
mean value: 0.8722353558136309
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01021671 0.01008439 0.00848007 0.00817442 0.00810814 0.00807619
0.00805545 0.00815272 0.00819159 0.00817704]
mean value: 0.008571672439575195
key: score_time
value: [0.01060987 0.00964761 0.0087111 0.00849438 0.00838518 0.00844193
0.00843549 0.00848222 0.0084908 0.00846648]
mean value: 0.00881650447845459
key: test_mcc
value: [0.79778885 0.72706729 0.79110556 0.89988258 0.71611487 0.78772636
0.79385662 0.75047877 0.67900461 0.75047877]
mean value: 0.7693504297544995
key: train_mcc
value: [0.76726164 0.78700923 0.77122983 0.7514861 0.77955173 0.77186893
0.77203657 0.77574087 0.78351922 0.77174925]
mean value: 0.7731453348144388
key: test_accuracy
value: [0.89473684 0.85964912 0.89473684 0.94736842 0.85714286 0.89285714
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.8828634085213033
key: train_accuracy
value: [0.88362919 0.89349112 0.88560158 0.87573964 0.88976378 0.88582677
0.88582677 0.88779528 0.89173228 0.88582677]
mean value: 0.8865233192004845
key: test_fscore
value: [0.9 0.86666667 0.9 0.94545455 0.86206897 0.88888889
0.88461538 0.87272727 0.84210526 0.87719298]
mean value: 0.8839719969484034
key: train_fscore
value: [0.88408644 0.89328063 0.88582677 0.87573964 0.89019608 0.88715953
0.8875969 0.88888889 0.89236791 0.88671875]
mean value: 0.8871861548728417
key: test_precision
value: [0.84375 0.8125 0.87096774 1. 0.83333333 0.92307692
0.95833333 0.88888889 0.82758621 0.86206897]
mean value: 0.8820505392981756
key: train_precision
value: [0.88235294 0.8968254 0.88235294 0.87401575 0.88671875 0.87692308
0.8740458 0.88030888 0.88715953 0.87984496]
mean value: 0.8820548030282749
key: test_recall
value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286
0.82142857 0.85714286 0.85714286 0.89285714]
mean value: 0.8899014778325123
key: train_recall
value: [0.88582677 0.88976378 0.88932806 0.87747036 0.89370079 0.8976378
0.9015748 0.8976378 0.8976378 0.89370079]
mean value: 0.8924278733932962
key: test_roc_auc
value: [0.89593596 0.86083744 0.89408867 0.94827586 0.85714286 0.89285714
0.89285714 0.875 0.83928571 0.875 ]
mean value: 0.883128078817734
key: train_roc_auc
value: [0.88362485 0.89349849 0.88560891 0.87574305 0.88976378 0.88582677
0.88582677 0.88779528 0.89173228 0.88582677]
mean value: 0.8865246957766643
key: test_jcc
value: [0.81818182 0.76470588 0.81818182 0.89655172 0.75757576 0.8
0.79310345 0.77419355 0.72727273 0.78125 ]
mean value: 0.7931016724365952
key: train_jcc
value: [0.79225352 0.80714286 0.795053 0.77894737 0.80212014 0.7972028
0.79790941 0.8 0.80565371 0.79649123]
mean value: 0.7972774034752823
MCC on Blind test: 0.28
Accuracy on Blind test: 0.71
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01202488 0.01277637 0.0122776 0.01319098 0.013026 0.01311898
0.01343989 0.01440072 0.01275349 0.01293349]
mean value: 0.01299424171447754
key: score_time
value: [0.00864363 0.00991464 0.0099988 0.01055336 0.01052094 0.01076031
0.01056218 0.01061869 0.01053238 0.0105021 ]
mean value: 0.010260701179504395
key: test_mcc
value: [0.7589669 0.82942474 0.30469361 0.9321832 0.26997462 0.6882472
0.26997462 0.76225171 0.82195294 0.85933785]
mean value: 0.649700739821353
key: train_mcc
value: [0.88439556 0.87825675 0.35307124 0.8935508 0.46259784 0.65176051
0.33210739 0.86516672 0.88616336 0.86094079]
mean value: 0.7068010966004633
key: test_accuracy
value: [0.87719298 0.9122807 0.57894737 0.96491228 0.58928571 0.82142857
0.58928571 0.875 0.91071429 0.92857143]
mean value: 0.8047619047619048
key: train_accuracy
value: [0.9408284 0.93885602 0.61143984 0.94674556 0.68110236 0.8011811
0.6023622 0.93110236 0.94291339 0.92913386]
mean value: 0.8325665098075758
key: test_fscore
value: [0.88135593 0.91525424 0.29411765 0.96428571 0.7012987 0.84848485
0.7012987 0.8852459 0.9122807 0.93103448]
mean value: 0.8034656868070665
key: train_fscore
value: [0.94318182 0.94003868 0.36245955 0.94632207 0.75675676 0.83305785
0.71468927 0.93383743 0.94211577 0.93181818]
mean value: 0.8304277370347289
key: test_precision
value: [0.83870968 0.87096774 1. 1. 0.55102041 0.73684211
0.55102041 0.81818182 0.89655172 0.9 ]
mean value: 0.8163293883264277
key: train_precision
value: [0.90875912 0.92395437 1. 0.952 0.61165049 0.71794872
0.55726872 0.89818182 0.95546559 0.89781022]
mean value: 0.8423039046768191
key: test_recall
value: [0.92857143 0.96428571 0.17241379 0.93103448 0.96428571 1.
0.96428571 0.96428571 0.92857143 0.96428571]
mean value: 0.8782019704433498
key: train_recall
value: [0.98031496 0.95669291 0.22134387 0.94071146 0.99212598 0.99212598
0.99606299 0.97244094 0.92913386 0.96850394]
mean value: 0.8949456910771529
key: test_roc_auc
value: [0.87807882 0.91317734 0.5862069 0.96551724 0.58928571 0.82142857
0.58928571 0.875 0.91071429 0.92857143]
mean value: 0.8057266009852218
key: train_roc_auc
value: [0.94075037 0.93882076 0.61067194 0.94673368 0.68110236 0.8011811
0.6023622 0.93110236 0.94291339 0.92913386]
mean value: 0.832477202701441
key: test_jcc
value: [0.78787879 0.84375 0.17241379 0.93103448 0.54 0.73684211
0.54 0.79411765 0.83870968 0.87096774]
mean value: 0.7055714235417677
key: train_jcc
value: [0.89247312 0.88686131 0.22134387 0.89811321 0.60869565 0.71388102
0.55604396 0.87588652 0.89056604 0.87234043]
mean value: 0.7416205129351496
MCC on Blind test: 0.17
Accuracy on Blind test: 0.53
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01449275 0.01378703 0.01423025 0.01384568 0.01323628 0.01458144
0.01555753 0.01341605 0.01425099 0.01479554]
mean value: 0.014219355583190919
key: score_time
value: [0.01101327 0.01102638 0.01097178 0.01104283 0.01102424 0.01096249
0.01104045 0.01096034 0.01094913 0.01095819]
mean value: 0.010994911193847656
key: test_mcc
value: [0.85960591 0.92980296 0.8615634 0.9321832 0.76225171 0.96490128
0.93094934 0.89342711 0.78772636 0.82618439]
mean value: 0.8748595658423624
key: train_mcc
value: [0.90933143 0.9215681 0.86053354 0.89231105 0.86150531 0.91030286
0.90951226 0.87252327 0.91349911 0.86883933]
mean value: 0.8919926257747095
key: test_accuracy
value: [0.92982456 0.96491228 0.92982456 0.96491228 0.875 0.98214286
0.96428571 0.94642857 0.89285714 0.91071429]
mean value: 0.9360902255639098
key: train_accuracy
value: [0.95463511 0.96055227 0.9270217 0.94477318 0.92913386 0.95472441
0.95472441 0.93503937 0.95669291 0.93307087]
mean value: 0.9450368075292364
key: test_fscore
value: [0.92857143 0.96428571 0.93333333 0.96428571 0.8852459 0.98181818
0.96296296 0.94736842 0.89655172 0.91525424]
mean value: 0.9379677619375378
key: train_fscore
value: [0.95499022 0.96 0.9310987 0.94238683 0.93207547 0.95372233
0.95445545 0.9373814 0.95703125 0.93560606]
mean value: 0.9458747709029058
key: test_precision
value: [0.92857143 0.96428571 0.90322581 1. 0.81818182 1.
1. 0.93103448 0.86666667 0.87096774]
mean value: 0.9282933658851346
key: train_precision
value: [0.94941634 0.97560976 0.88028169 0.98283262 0.89492754 0.97530864
0.96015936 0.9047619 0.9496124 0.90145985]
mean value: 0.937437010931088
key: test_recall
value: [0.92857143 0.96428571 0.96551724 0.93103448 0.96428571 0.96428571
0.92857143 0.96428571 0.92857143 0.96428571]
mean value: 0.9503694581280788
key: train_recall
value: [0.96062992 0.94488189 0.98814229 0.90513834 0.97244094 0.93307087
0.9488189 0.97244094 0.96456693 0.97244094]
mean value: 0.9562571970993744
key: test_roc_auc
value: [0.92980296 0.96490148 0.92918719 0.96551724 0.875 0.98214286
0.96428571 0.94642857 0.89285714 0.91071429]
mean value: 0.9360837438423646
key: train_roc_auc
value: [0.95462326 0.96058324 0.92714201 0.94469515 0.92913386 0.95472441
0.95472441 0.93503937 0.95669291 0.93307087]
mean value: 0.9450429491768074
key: test_jcc
value: [0.86666667 0.93103448 0.875 0.93103448 0.79411765 0.96428571
0.92857143 0.9 0.8125 0.84375 ]
mean value: 0.8846960422099874
key: train_jcc
value: [0.91385768 0.92307692 0.87108014 0.89105058 0.87279152 0.91153846
0.91287879 0.88214286 0.917603 0.87900356]
mean value: 0.8975023504978233
MCC on Blind test: 0.12
Accuracy on Blind test: 0.42
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.11413288 0.1020143 0.10187316 0.10205579 0.10220885 0.10228324
0.10212898 0.10206866 0.10222936 0.10228419]
mean value: 0.10332794189453125
key: score_time
value: [0.01537633 0.01542163 0.01549554 0.01547527 0.01547194 0.01569366
0.01554465 0.01544762 0.01546764 0.01553702]
mean value: 0.015493130683898926
key: test_mcc
value: [0.92980296 0.8951918 0.96547546 0.96551724 0.82618439 1.
1. 0.92857143 0.92857143 0.89342711]
mean value: 0.9332741814992628
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.94736842 0.98245614 0.98245614 0.91071429 1.
1. 0.96428571 0.96428571 0.94642857]
mean value: 0.9662907268170426
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.94545455 0.98305085 0.98245614 0.91525424 1.
1. 0.96428571 0.96428571 0.94736842]
mean value: 0.966644133446096
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.96296296 0.96666667 1. 0.87096774 1.
1. 0.96428571 0.96428571 0.93103448]
mean value: 0.9624488997180877
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.92857143 1. 0.96551724 0.96428571 1.
1. 0.96428571 0.96428571 0.96428571]
mean value: 0.971551724137931
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.94704433 0.98214286 0.98275862 0.91071429 1.
1. 0.96428571 0.96428571 0.94642857]
mean value: 0.966256157635468
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.89655172 0.96666667 0.96551724 0.84375 1.
1. 0.93103448 0.93103448 0.9 ]
mean value: 0.936558908045977
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.39
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03648138 0.03345275 0.03663468 0.03281045 0.03574181 0.03963566
0.04988575 0.04006696 0.03826404 0.04667592]
mean value: 0.03896493911743164
key: score_time
value: [0.0171628 0.0221951 0.02012706 0.02416539 0.02962255 0.03259301
0.0243876 0.07890892 0.02576041 0.01878405]
mean value: 0.029370689392089845
key: test_mcc
value: [0.92980296 0.8951918 0.8951918 1. 0.82195294 0.89802651
0.96490128 0.92857143 0.92857143 0.89342711]
mean value: 0.915563726523076
key: train_mcc
value: [0.99606293 0.98425123 0.97636129 0.99606299 0.98819663 0.99607071
0.99212598 0.98819663 0.98819663 0.99607071]
mean value: 0.9901595758977872
key: test_accuracy
value: [0.96491228 0.94736842 0.94736842 1. 0.91071429 0.94642857
0.98214286 0.96428571 0.96428571 0.94642857]
mean value: 0.9573934837092731
key: train_accuracy
value: [0.99802761 0.99211045 0.98816568 0.99802761 0.99409449 0.9980315
0.99606299 0.99409449 0.99409449 0.9980315 ]
mean value: 0.9950740809765644
key: test_fscore
value: [0.96428571 0.94545455 0.94915254 1. 0.9122807 0.94915254
0.98181818 0.96428571 0.96428571 0.94736842]
mean value: 0.957808407768265
key: train_fscore
value: [0.99803536 0.99215686 0.98809524 0.99802761 0.99410609 0.99803536
0.99606299 0.99408284 0.99410609 0.99802761]
mean value: 0.9950736067689547
key: test_precision
value: [0.96428571 0.96296296 0.93333333 1. 0.89655172 0.90322581
1. 0.96428571 0.96428571 0.93103448]
mean value: 0.9519965452501604
key: train_precision
value: [0.99607843 0.98828125 0.99203187 0.99606299 0.99215686 0.99607843
0.99606299 0.99604743 0.99215686 1. ]
mean value: 0.9944957125827263
key: test_recall
value: [0.96428571 0.92857143 0.96551724 1. 0.92857143 1.
0.96428571 0.96428571 0.96428571 0.96428571]
mean value: 0.9644088669950739
key: train_recall
value: [1. 0.99606299 0.98418972 1. 0.99606299 1.
0.99606299 0.99212598 0.99606299 0.99606299]
mean value: 0.9956630668202048
key: test_roc_auc
value: [0.96490148 0.94704433 0.94704433 1. 0.91071429 0.94642857
0.98214286 0.96428571 0.96428571 0.94642857]
mean value: 0.9573275862068966
key: train_roc_auc
value: [0.99802372 0.99210264 0.98815785 0.9980315 0.99409449 0.9980315
0.99606299 0.99409449 0.99409449 0.9980315 ]
mean value: 0.9950725156391025
key: test_jcc
value: [0.93103448 0.89655172 0.90322581 1. 0.83870968 0.90322581
0.96428571 0.93103448 0.93103448 0.9 ]
mean value: 0.9199102177022088
key: train_jcc
value: [0.99607843 0.9844358 0.97647059 0.99606299 0.98828125 0.99607843
0.99215686 0.98823529 0.98828125 0.99606299]
mean value: 0.9902143889760475
MCC on Blind test: 0.09
Accuracy on Blind test: 0.38
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.16053271 0.1920464 0.18025184 0.14875555 0.09875679 0.1012013
0.16610289 0.17586374 0.18154573 0.15119648]
mean value: 0.1556253433227539
key: score_time
value: [0.02007532 0.02150774 0.02181888 0.01334596 0.02305841 0.01333928
0.02570271 0.02890587 0.01332211 0.02642632]
mean value: 0.020750260353088378
key: test_mcc
value: [0.76689254 0.79778885 0.75462449 0.8953202 0.71611487 0.78772636
0.82618439 0.75047877 0.71611487 0.78772636]
mean value: 0.7798971716204695
key: train_mcc
value: [0.84667632 0.85019923 0.85012683 0.84728344 0.85850727 0.84698856
0.8231473 0.84725158 0.8742597 0.86237183]
mean value: 0.8506812043667749
key: test_accuracy
value: [0.87719298 0.89473684 0.87719298 0.94736842 0.85714286 0.89285714
0.91071429 0.875 0.85714286 0.89285714]
mean value: 0.8882205513784461
key: train_accuracy
value: [0.92307692 0.92504931 0.92504931 0.92307692 0.92913386 0.92322835
0.91141732 0.92322835 0.93700787 0.93110236]
mean value: 0.9251370575719455
key: test_fscore
value: [0.8852459 0.9 0.88135593 0.94736842 0.86206897 0.88888889
0.90566038 0.87272727 0.86206897 0.89655172]
mean value: 0.8901936449042431
key: train_fscore
value: [0.9245648 0.92578125 0.92519685 0.92485549 0.92996109 0.9245648
0.91262136 0.92485549 0.93774319 0.93177388]
mean value: 0.9261918195384349
key: test_precision
value: [0.81818182 0.84375 0.86666667 0.96428571 0.83333333 0.92307692
0.96 0.88888889 0.83333333 0.86666667]
mean value: 0.8798183344433345
key: train_precision
value: [0.90874525 0.91860465 0.92156863 0.90225564 0.91923077 0.90874525
0.90038314 0.90566038 0.92692308 0.92277992]
mean value: 0.9134896700062805
key: test_recall
value: [0.96428571 0.96428571 0.89655172 0.93103448 0.89285714 0.85714286
0.85714286 0.85714286 0.89285714 0.92857143]
mean value: 0.9041871921182266
key: train_recall
value: [0.94094488 0.93307087 0.92885375 0.9486166 0.94094488 0.94094488
0.92519685 0.94488189 0.9488189 0.94094488]
mean value: 0.9393218387227288
key: test_roc_auc
value: [0.87869458 0.89593596 0.87684729 0.9476601 0.85714286 0.89285714
0.91071429 0.875 0.85714286 0.89285714]
mean value: 0.8884852216748769
key: train_roc_auc
value: [0.92304161 0.92503346 0.9250568 0.9231272 0.92913386 0.92322835
0.91141732 0.92322835 0.93700787 0.93110236]
mean value: 0.9251377174691109
key: test_jcc
value: [0.79411765 0.81818182 0.78787879 0.9 0.75757576 0.8
0.82758621 0.77419355 0.75757576 0.8125 ]
mean value: 0.8029609523554593
key: train_jcc
value: [0.85971223 0.86181818 0.86080586 0.86021505 0.86909091 0.85971223
0.83928571 0.86021505 0.88278388 0.87226277]
mean value: 0.8625901890465713
MCC on Blind test: 0.29
Accuracy on Blind test: 0.72
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.27303624 0.25770164 0.24925447 0.25132322 0.24955368 0.25238132
0.25456405 0.25274968 0.25270486 0.25955772]
mean value: 0.25528268814086913
key: score_time
value: [0.00924158 0.0084126 0.00870824 0.00866604 0.0086503 0.0085566
0.00878334 0.00932741 0.00889683 0.00852871]
mean value: 0.008777165412902832
key: test_mcc
value: [0.92980296 0.92980296 0.8951918 1. 0.82195294 0.93094934
1. 0.89342711 0.96490128 0.92857143]
mean value: 0.9294599815844486
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96491228 0.96491228 0.94736842 1. 0.91071429 0.96428571
1. 0.94642857 0.98214286 0.96428571]
mean value: 0.9645050125313284
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96428571 0.96428571 0.94915254 1. 0.9122807 0.96551724
1. 0.94545455 0.98181818 0.96428571]
mean value: 0.9647080355636448
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.96428571 0.96428571 0.93333333 1. 0.89655172 0.93333333
1. 0.96296296 1. 0.96428571]
mean value: 0.9619038496624703
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1.
1. 0.92857143 0.96428571 0.96428571]
mean value: 0.9679802955665024
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96490148 0.96490148 0.94704433 1. 0.91071429 0.96428571
1. 0.94642857 0.98214286 0.96428571]
mean value: 0.9644704433497537
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93103448 0.93103448 0.90322581 1. 0.83870968 0.93333333
1. 0.89655172 0.96428571 0.93103448]
mean value: 0.9329209703903808
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.3
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.01269507 0.0144012 0.01376104 0.01396441 0.01427913 0.01633024
0.01445484 0.01412106 0.01710892 0.01414371]
mean value: 0.014525961875915528
key: score_time
value: [0.01109529 0.01073503 0.01095819 0.01101351 0.01113129 0.01230645
0.01111698 0.01105785 0.01199865 0.01175356]
mean value: 0.011316680908203125
key: test_mcc
value: [0.5149026 0.65634573 0.65634573 0.76689254 0.67900461 0.57735027
0.83484711 0.64285714 0.56573571 0.71611487]
mean value: 0.6610396314952282
key: train_mcc
value: [0.76157807 0.80278863 0.76582615 0.80208917 0.81501748 0.8019582
0.81112421 0.82360735 0.73708689 0.76803489]
mean value: 0.78891110313645
key: test_accuracy
value: [0.75438596 0.8245614 0.8245614 0.87719298 0.83928571 0.78571429
0.91071429 0.82142857 0.76785714 0.85714286]
mean value: 0.8262844611528822
key: train_accuracy
value: [0.87573964 0.90138067 0.87968442 0.89940828 0.90748031 0.8996063
0.90551181 0.91141732 0.86417323 0.87992126]
mean value: 0.8924323253971952
key: test_fscore
value: [0.76666667 0.83333333 0.81481481 0.86792453 0.84210526 0.76923077
0.90196078 0.82142857 0.8 0.85185185]
mean value: 0.8269316583099514
key: train_fscore
value: [0.86509636 0.90118577 0.87103594 0.89440994 0.90693069 0.89527721
0.9047619 0.90945674 0.8738574 0.87048832]
mean value: 0.8892500281591235
key: test_precision
value: [0.71875 0.78125 0.88 0.95833333 0.82758621 0.83333333
1. 0.82142857 0.7027027 0.88461538]
mean value: 0.8407999532309878
key: train_precision
value: [0.94835681 0.9047619 0.93636364 0.93913043 0.9123506 0.93562232
0.912 0.93004115 0.81569966 0.94470046]
mean value: 0.9179026970421955
key: test_recall
value: [0.82142857 0.89285714 0.75862069 0.79310345 0.85714286 0.71428571
0.82142857 0.82142857 0.92857143 0.82142857]
mean value: 0.8230295566502464
key: train_recall
value: [0.79527559 0.8976378 0.81422925 0.85375494 0.9015748 0.85826772
0.8976378 0.88976378 0.94094488 0.80708661]
mean value: 0.8656173166101273
key: test_roc_auc
value: [0.75554187 0.82573892 0.82573892 0.87869458 0.83928571 0.78571429
0.91071429 0.82142857 0.76785714 0.85714286]
mean value: 0.8267857142857143
key: train_roc_auc
value: [0.87589866 0.90138807 0.87955557 0.89931842 0.90748031 0.8996063
0.90551181 0.91141732 0.86417323 0.87992126]
mean value: 0.892427095328499
key: test_jcc
value: [0.62162162 0.71428571 0.6875 0.76666667 0.72727273 0.625
0.82142857 0.6969697 0.66666667 0.74193548]
mean value: 0.7069347148782633
key: train_jcc
value: [0.76226415 0.82014388 0.77153558 0.80898876 0.82971014 0.81040892
0.82608696 0.83394834 0.77597403 0.77067669]
mean value: 0.8009737460973876
MCC on Blind test: 0.31
Accuracy on Blind test: 0.68
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01170778 0.01146626 0.03049469 0.03104663 0.02517843 0.0244348
0.03004813 0.03791165 0.03372884 0.02444196]
mean value: 0.026045918464660645
key: score_time
value: [0.01076579 0.01079035 0.01894236 0.01367116 0.02186847 0.02198577
0.01771045 0.02187347 0.02229476 0.02252173]
mean value: 0.018242430686950684
key: test_mcc
value: [0.82942474 0.76689254 0.79110556 0.9321832 0.71611487 0.82195294
0.85933785 0.78772636 0.67900461 0.75047877]
mean value: 0.7934221443680324
key: train_mcc
value: [0.8266528 0.81876065 0.82265144 0.82358593 0.83910959 0.81142619
0.80377277 0.81930411 0.83123063 0.80759374]
mean value: 0.8204087868380765
key: test_accuracy
value: [0.9122807 0.87719298 0.89473684 0.96491228 0.85714286 0.91071429
0.92857143 0.89285714 0.83928571 0.875 ]
mean value: 0.8952694235588973
key: train_accuracy
value: [0.91321499 0.90927022 0.9112426 0.9112426 0.91929134 0.90551181
0.9015748 0.90944882 0.91535433 0.90354331]
mean value: 0.9099694823650002
key: test_fscore
value: [0.91525424 0.8852459 0.9 0.96428571 0.86206897 0.90909091
0.92592593 0.88888889 0.84210526 0.87719298]
mean value: 0.8970058788250195
key: train_fscore
value: [0.91439689 0.91050584 0.91193738 0.9132948 0.92069632 0.90697674
0.9034749 0.91085271 0.91682785 0.90522244]
mean value: 0.9114185875040357
key: test_precision
value: [0.87096774 0.81818182 0.87096774 1. 0.83333333 0.92592593
0.96153846 0.92307692 0.82758621 0.86206897]
mean value: 0.8893647118341224
key: train_precision
value: [0.90384615 0.9 0.90310078 0.89097744 0.90494297 0.89312977
0.88636364 0.89694656 0.90114068 0.88973384]
mean value: 0.8970181835384771
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.89285714 0.89285714
0.89285714 0.85714286 0.85714286 0.89285714]
mean value: 0.9076354679802956
key: train_recall
value: [0.92519685 0.92125984 0.92094862 0.93675889 0.93700787 0.92125984
0.92125984 0.92519685 0.93307087 0.92125984]
mean value: 0.9263219320905045
key: test_roc_auc
value: [0.91317734 0.87869458 0.89408867 0.96551724 0.85714286 0.91071429
0.92857143 0.89285714 0.83928571 0.875 ]
mean value: 0.8955049261083744
key: train_roc_auc
value: [0.91319131 0.90924652 0.91126171 0.91129283 0.91929134 0.90551181
0.9015748 0.90944882 0.91535433 0.90354331]
mean value: 0.9099716784413806
key: test_jcc
value: [0.84375 0.79411765 0.81818182 0.93103448 0.75757576 0.83333333
0.86206897 0.8 0.72727273 0.78125 ]
mean value: 0.8148584731698322
key: train_jcc
value: [0.84229391 0.83571429 0.8381295 0.84042553 0.85304659 0.82978723
0.82394366 0.83629893 0.84642857 0.82685512]
mean value: 0.837292333932638
MCC on Blind test: 0.25
Accuracy on Blind test: 0.71
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa',
'kd_values', 'rd_values', 'electro_rr', 'electro_mm', '...
'volumetric_mm', 'volumetric_ss', 'consurf_score', 'snap2_score',
'provean_score', 'maf', 'logorI', 'lineage_proportion',
'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.18650627 0.19624949 0.19074392 0.19366646 0.20337486 0.19410229
0.19437337 0.18854046 0.24514914 0.20517421]
mean value: 0.19978804588317872
key: score_time
value: [0.01944613 0.01078582 0.01317906 0.01077127 0.01899004 0.02023435
0.02145123 0.01528096 0.01077437 0.02091956]
mean value: 0.01618328094482422
key: test_mcc
value: [0.82942474 0.76689254 0.79110556 0.9321832 0.75434227 0.82195294
0.85933785 0.85933785 0.67900461 0.78571429]
mean value: 0.8079295836885118
key: train_mcc
value: [0.8266528 0.86654135 0.82265144 0.85931426 0.86681377 0.85105352
0.80377277 0.85922715 0.86681377 0.85105352]
mean value: 0.847389438313628
key: test_accuracy
value: [0.9122807 0.87719298 0.89473684 0.96491228 0.875 0.91071429
0.92857143 0.92857143 0.83928571 0.89285714]
mean value: 0.9024122807017544
key: train_accuracy
value: [0.91321499 0.93293886 0.9112426 0.92899408 0.93307087 0.92519685
0.9015748 0.92913386 0.93307087 0.92519685]
mean value: 0.9233634627032568
key: test_fscore
value: [0.91525424 0.8852459 0.9 0.96428571 0.88135593 0.90909091
0.92592593 0.92592593 0.84210526 0.89285714]
mean value: 0.9042046952374383
key: train_fscore
value: [0.91439689 0.93436293 0.91193738 0.93076923 0.93436293 0.92664093
0.9034749 0.93076923 0.93436293 0.92664093]
mean value: 0.9247718286234357
key: test_precision
value: [0.87096774 0.81818182 0.87096774 1. 0.83870968 0.92592593
0.96153846 0.96153846 0.82758621 0.89285714]
mean value: 0.8968273178228685
key: train_precision
value: [0.90384615 0.91666667 0.90310078 0.90636704 0.91666667 0.90909091
0.88636364 0.90977444 0.91666667 0.90909091]
mean value: 0.9077633860874135
key: test_recall
value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714
0.89285714 0.89285714 0.85714286 0.89285714]
mean value: 0.9147783251231527
key: train_recall
value: [0.92519685 0.95275591 0.92094862 0.95652174 0.95275591 0.94488189
0.92125984 0.95275591 0.95275591 0.94488189]
mean value: 0.9424714450219415
key: test_roc_auc
value: [0.91317734 0.87869458 0.89408867 0.96551724 0.875 0.91071429
0.92857143 0.92857143 0.83928571 0.89285714]
mean value: 0.9026477832512315
key: train_roc_auc
value: [0.91319131 0.93289969 0.91126171 0.92904827 0.93307087 0.92519685
0.9015748 0.92913386 0.93307087 0.92519685]
mean value: 0.9233645077962094
key: test_jcc
value: [0.84375 0.79411765 0.81818182 0.93103448 0.78787879 0.83333333
0.86206897 0.86206897 0.72727273 0.80645161]
mean value: 0.826615834042182
key: train_jcc
value: [0.84229391 0.87681159 0.8381295 0.8705036 0.87681159 0.86330935
0.82394366 0.8705036 0.87681159 0.86330935]
mean value: 0.8602427747074015
MCC on Blind test: 0.25
Accuracy on Blind test: 0.7