LSHTM_analysis/scripts/ml/log_gid_config.txt
2022-06-20 21:55:47 +01:00

18905 lines
917 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 531
PASS: my_features_df and aa_df successfully combined
nrows: 531
ncols: 286
count of NULL values before imputation
or_mychisq 263
log10_or_mychisq 263
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
No. of numerical features: 44
No. of categorical features: 7
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
Original Data
Counter({0: 76, 1: 43}) Data dim: (119, 51)
-------------------------------------------------------------
Successfully split data: UQ [no aa_index but active site included] training
actual values: training set
imputed values: blind test set
Train data size: (119, 51)
Test data size: (412, 51)
y_train numbers: Counter({0: 76, 1: 43})
y_train ratio: 1.7674418604651163
y_test_numbers: Counter({0: 409, 1: 3})
y_test ratio: 136.33333333333334
-------------------------------------------------------------
Simple Random OverSampling
Counter({0: 76, 1: 76})
(152, 51)
Simple Random UnderSampling
Counter({0: 43, 1: 43})
(86, 51)
Simple Combined Over and UnderSampling
Counter({0: 76, 1: 76})
(152, 51)
SMOTE_NC OverSampling
Counter({0: 76, 1: 76})
(152, 51)
#####################################################################
Running ML analysis: UQ [without AA index but with active site annotations]
Gene name: gid
Drug name: streptomycin
Output directory: /home/tanu/git/Data/streptomycin/output/ml/uq_v1/
Sanity checks:
Total input features: 51
Training data size: (119, 51)
Test data size: (412, 51)
Target feature numbers (training data): Counter({0: 76, 1: 43})
Target features ratio (training data: 1.7674418604651163
Target feature numbers (test data): Counter({0: 409, 1: 3})
Target features ratio (test data): 136.33333333333334
#####################################################################
================================================================
Strucutral features (n): 35
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01348901 0.01226354 0.01228833 0.01419163 0.01196408 0.01235008
0.01226306 0.01198006 0.01196694 0.01303458]
mean value: 0.012579131126403808
key: score_time
value: [0.00877213 0.00875831 0.0089767 0.00837636 0.00834727 0.00831747
0.00833845 0.00829291 0.00832725 0.00867438]
mean value: 0.008518123626708984
key: test_mcc
value: [0.42640143 0.40824829 0. 0.625 0.63245553 0.70710678
0.68313005 0.83666003 0.31428571 0.62360956]
mean value: 0.5256897392741394
key: train_mcc
value: [0.73433335 0.80052092 0.81774488 0.71490799 0.77603911 0.73433335
0.75414636 0.75414636 0.79379397 0.7364483 ]
mean value: 0.7616414584886299
key: test_accuracy
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.75 0.75 0.5 0.83333333 0.83333333 0.83333333
0.83333333 0.91666667 0.66666667 0.81818182]
mean value: 0.7734848484848484
key: train_accuracy
value: [0.87850467 0.90654206 0.91588785 0.86915888 0.89719626 0.87850467
0.88785047 0.88785047 0.90654206 0.87962963]
mean value: 0.89076670128072
key: test_fscore
value: [0.4 0.57142857 0.4 0.75 0.66666667 0.8
0.75 0.88888889 0.6 0.66666667]
mean value: 0.6493650793650794
key: train_fscore
value: [0.82191781 0.85714286 0.87671233 0.8 0.84931507 0.82191781
0.82352941 0.82352941 0.86111111 0.81690141]
mean value: 0.8352077213932715
key: test_precision
value: [1. 0.66666667 0.33333333 0.75 1. 0.66666667
1. 1. 0.6 1. ]
mean value: 0.8016666666666666
key: train_precision
value: [0.88235294 0.96774194 0.94117647 0.90322581 0.91176471 0.88235294
0.93333333 0.93333333 0.91176471 0.90625 ]
mean value: 0.9173296173308033
key: test_recall
value: [0.25 0.5 0.5 0.75 0.5 1. 0.6 0.8 0.6 0.5 ]
mean value: 0.6
key: train_recall
value: [0.76923077 0.76923077 0.82051282 0.71794872 0.79487179 0.76923077
0.73684211 0.73684211 0.81578947 0.74358974]
mean value: 0.7674089068825911
key: test_roc_auc
value: [0.625 0.6875 0.5 0.8125 0.75 0.875
0.8 0.9 0.65714286 0.75 ]
mean value: 0.7357142857142858
key: train_roc_auc
value: [0.85520362 0.87726244 0.89555053 0.83691554 0.87537707 0.85520362
0.8539283 0.8539283 0.88615561 0.85005574]
mean value: 0.8639580766297014
key: test_jcc
value: [0.25 0.4 0.25 0.6 0.5 0.66666667
0.6 0.8 0.42857143 0.5 ]
mean value: 0.49952380952380954
key: train_jcc
value: [0.69767442 0.75 0.7804878 0.66666667 0.73809524 0.69767442
0.7 0.7 0.75609756 0.69047619]
mean value: 0.7177172298301056
MCC on Blind test: 0.15
Accuracy on Blind test: 0.77
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.38598299 0.37587595 0.37197232 0.36586213 0.3655436 0.3787601
0.37384486 0.3567059 0.36220002 0.34969902]
mean value: 0.3686446905136108
key: score_time
value: [0.00918126 0.00917006 0.00951552 0.00891447 0.00908375 0.00929761
0.00941563 0.00886798 0.00938153 0.00919795]
mean value: 0.00920257568359375
key: test_mcc
value: [1. 0.625 0.35355339 0.83666003 0.625 0.70710678
0.83666003 0.83666003 0.50709255 0.60714286]
mean value: 0.6934875661362015
key: train_mcc
value: [0.89876312 1. 0.9600061 0.85805669 0.95965309 0.95965309
0.93862091 0.85625561 1. 0.81859189]
mean value: 0.9249600511154796
key: test_accuracy
value: [1. 0.83333333 0.66666667 0.91666667 0.83333333 0.83333333
0.91666667 0.91666667 0.75 0.81818182]
mean value: 0.8484848484848485
key: train_accuracy
value: [0.95327103 1. 0.98130841 0.93457944 0.98130841 0.98130841
0.97196262 0.93457944 1. 0.91666667]
mean value: 0.9654984423676012
key: test_fscore
value: [1. 0.75 0.6 0.88888889 0.75 0.8
0.88888889 0.88888889 0.72727273 0.75 ]
mean value: 0.8043939393939394
key: train_fscore
value: [0.93506494 1. 0.97368421 0.90666667 0.97435897 0.97435897
0.96 0.90410959 1. 0.87671233]
mean value: 0.9504955678784085
key: test_precision
value: [1. 0.75 0.5 0.8 0.75 0.66666667
1. 1. 0.66666667 0.75 ]
mean value: 0.7883333333333333
key: train_precision
value: [0.94736842 1. 1. 0.94444444 0.97435897 0.97435897
0.97297297 0.94285714 1. 0.94117647]
mean value: 0.9697537400633376
key: test_recall
value: [1. 0.75 0.75 1. 0.75 1. 0.8 0.8 0.8 0.75]
mean value: 0.84
key: train_recall
value: [0.92307692 1. 0.94871795 0.87179487 0.97435897 0.97435897
0.94736842 0.86842105 1. 0.82051282]
mean value: 0.9328609986504723
key: test_roc_auc
value: [1. 0.8125 0.6875 0.9375 0.8125 0.875
0.9 0.9 0.75714286 0.80357143]
mean value: 0.8485714285714285
key: train_roc_auc
value: [0.94683258 1. 0.97435897 0.92119155 0.97982655 0.97982655
0.96643783 0.91971777 1. 0.89576366]
mean value: 0.9583955462135567
key: test_jcc
value: [1. 0.6 0.42857143 0.8 0.6 0.66666667
0.8 0.8 0.57142857 0.6 ]
mean value: 0.6866666666666666
key: train_jcc
value: [0.87804878 1. 0.94871795 0.82926829 0.95 0.95
0.92307692 0.825 1. 0.7804878 ]
mean value: 0.9084599749843653
MCC on Blind test: 0.01
Accuracy on Blind test: 0.7
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00942707 0.00902557 0.00695324 0.00660396 0.00666595 0.00662398
0.00659394 0.00675702 0.00663805 0.00661373]
mean value: 0.007190251350402832
key: score_time
value: [0.01058674 0.01051068 0.00814915 0.00790286 0.00790191 0.0078671
0.00790071 0.00779772 0.00793719 0.0078752 ]
mean value: 0.008442926406860351
key: test_mcc
value: [0.81649658 0.47809144 0.5 0.23904572 0.35355339 0.47809144
0.16903085 0.50709255 0.16903085 0.35634832]
mean value: 0.4066781158133809
key: train_mcc
value: [0.63375685 0.67693504 0.66003337 0.51450646 0.70701192 0.58648859
0.69614472 0.60558322 0.65590587 0.6700827 ]
mean value: 0.6406448743407447
key: test_accuracy
value: [0.91666667 0.75 0.66666667 0.58333333 0.66666667 0.75
0.58333333 0.75 0.58333333 0.54545455]
mean value: 0.6795454545454546
key: train_accuracy
value: [0.80373832 0.8317757 0.8317757 0.71962617 0.85046729 0.81308411
0.8317757 0.82242991 0.80373832 0.83333333]
mean value: 0.8141744548286605
key: test_fscore
value: [0.85714286 0.66666667 0.66666667 0.54545455 0.6 0.66666667
0.54545455 0.72727273 0.54545455 0.61538462]
mean value: 0.6436163836163835
key: train_fscore
value: [0.77419355 0.8 0.79069767 0.70588235 0.81818182 0.71428571
0.80434783 0.6984127 0.77894737 0.79545455]
mean value: 0.7680403546589664
key: test_precision
value: [1. 0.6 0.5 0.42857143 0.5 0.6
0.5 0.66666667 0.5 0.44444444]
mean value: 0.5739682539682539
key: train_precision
value: [0.66666667 0.70588235 0.72340426 0.57142857 0.73469388 0.80645161
0.68518519 0.88 0.64912281 0.71428571]
mean value: 0.7137121043298253
key: test_recall
value: [0.75 0.75 1. 0.75 0.75 0.75 0.6 0.8 0.6 1. ]
mean value: 0.775
key: train_recall
value: [0.92307692 0.92307692 0.87179487 0.92307692 0.92307692 0.64102564
0.97368421 0.57894737 0.97368421 0.8974359 ]
mean value: 0.8628879892037787
key: test_roc_auc
value: [0.875 0.75 0.75 0.625 0.6875 0.75
0.58571429 0.75714286 0.58571429 0.64285714]
mean value: 0.7008928571428571
key: train_roc_auc
value: [0.82918552 0.85124434 0.8403092 0.76300905 0.86595023 0.77639517
0.8636537 0.76773455 0.84191457 0.84726867]
mean value: 0.8246665009957512
key: test_jcc
value: [0.75 0.5 0.5 0.375 0.42857143 0.5
0.375 0.57142857 0.375 0.44444444]
mean value: 0.48194444444444445
key: train_jcc
value: [0.63157895 0.66666667 0.65384615 0.54545455 0.69230769 0.55555556
0.67272727 0.53658537 0.63793103 0.66037736]
mean value: 0.625303059275329
MCC on Blind test: 0.03
Accuracy on Blind test: 0.49
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00705171 0.00684381 0.00682545 0.00677752 0.0068419 0.0067687
0.00679421 0.00680947 0.0066936 0.00685263]
mean value: 0.0068259000778198246
key: score_time
value: [0.00836945 0.0079124 0.0078671 0.00792718 0.00797105 0.00790572
0.00787711 0.00786543 0.00802302 0.00789714]
mean value: 0.007961559295654296
key: test_mcc
value: [ 0. 0.25 -0.23904572 0.47809144 0.40824829 0.
-0.09759001 0.52915026 0.31428571 0.38575837]
mean value: 0.20288983564397506
key: train_mcc
value: [0.4754902 0.50673892 0.4653488 0.44239297 0.48817818 0.50337256
0.39242808 0.39534618 0.48161946 0.37522992]
mean value: 0.4526145268371993
key: test_accuracy
value: [0.66666667 0.66666667 0.41666667 0.75 0.75 0.41666667
0.5 0.75 0.66666667 0.72727273]
mean value: 0.6310606060606061
key: train_accuracy
value: [0.75700935 0.77570093 0.75700935 0.74766355 0.76635514 0.77570093
0.72897196 0.71962617 0.76635514 0.72222222]
mean value: 0.7516614745586708
key: test_fscore
value: [0. 0.5 0.22222222 0.66666667 0.57142857 0.46153846
0.25 0.57142857 0.6 0.57142857]
mean value: 0.4414713064713065
key: train_fscore
value: [0.66666667 0.67567568 0.64864865 0.63013699 0.66666667 0.66666667
0.5915493 0.61538462 0.65753425 0.57142857]
mean value: 0.6390358039788872
key: test_precision
value: [0. 0.5 0.2 0.6 0.66666667 0.33333333
0.33333333 1. 0.6 0.66666667]
mean value: 0.49
key: train_precision
value: [0.66666667 0.71428571 0.68571429 0.67647059 0.69444444 0.72727273
0.63636364 0.6 0.68571429 0.64516129]
mean value: 0.6732093639019635
key: test_recall
value: [0. 0.5 0.25 0.75 0.5 0.75 0.2 0.4 0.6 0.5 ]
mean value: 0.445
key: train_recall
value: [0.66666667 0.64102564 0.61538462 0.58974359 0.64102564 0.61538462
0.55263158 0.63157895 0.63157895 0.51282051]
mean value: 0.6097840755735493
key: test_roc_auc
value: [0.5 0.625 0.375 0.75 0.6875 0.5
0.45714286 0.7 0.65714286 0.67857143]
mean value: 0.5930357142857143
key: train_roc_auc
value: [0.7377451 0.74698341 0.72680995 0.71398944 0.73963047 0.74151584
0.68935927 0.69984744 0.73607933 0.67670011]
mean value: 0.7208660360817448
key: test_jcc
value: [0. 0.33333333 0.125 0.5 0.4 0.3
0.14285714 0.4 0.42857143 0.4 ]
mean value: 0.30297619047619045
key: train_jcc
value: [0.5 0.51020408 0.48 0.46 0.5 0.5
0.42 0.44444444 0.48979592 0.4 ]
mean value: 0.47044444444444444
MCC on Blind test: 0.14
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00671077 0.00906372 0.00673389 0.00643206 0.00652814 0.00724578
0.00731564 0.00711012 0.00714564 0.00714374]
mean value: 0.007142949104309082
key: score_time
value: [0.04456663 0.02610064 0.00889969 0.00866151 0.00879526 0.00940728
0.00941706 0.00944591 0.00944233 0.00941896]
mean value: 0.014415526390075683
key: test_mcc
value: [ 0. 0. 0.47809144 0.625 0.15811388 0.47809144
0.07559289 0.29277002 0.11952286 -0.03857584]
mean value: 0.21886067104052556
key: train_mcc
value: [0.47836451 0.54358024 0.65128682 0.48080439 0.47687292 0.38417516
0.55925621 0.60298802 0.55802654 0.50141804]
mean value: 0.5236772842107089
key: test_accuracy
value: [0.66666667 0.58333333 0.75 0.83333333 0.66666667 0.75
0.58333333 0.66666667 0.58333333 0.54545455]
mean value: 0.6628787878787878
key: train_accuracy
value: [0.76635514 0.79439252 0.8411215 0.76635514 0.76635514 0.72897196
0.80373832 0.82242991 0.80373832 0.77777778]
mean value: 0.7871235721703012
key: test_fscore
value: [0. 0.28571429 0.66666667 0.75 0.33333333 0.66666667
0.28571429 0.5 0.44444444 0.28571429]
mean value: 0.4218253968253968
key: train_fscore
value: [0.63768116 0.66666667 0.75362319 0.64788732 0.62686567 0.53968254
0.69565217 0.70769231 0.67692308 0.64705882]
mean value: 0.6599732931818586
key: test_precision
value: [0. 0.33333333 0.6 0.75 0.5 0.6
0.5 0.66666667 0.5 0.33333333]
mean value: 0.47833333333333333
key: train_precision
value: [0.73333333 0.81481481 0.86666667 0.71875 0.75 0.70833333
0.77419355 0.85185185 0.81481481 0.75862069]
mean value: 0.7791379052857084
key: test_recall
value: [0. 0.25 0.75 0.75 0.25 0.75 0.2 0.4 0.4 0.25]
mean value: 0.4
key: train_recall
value: [0.56410256 0.56410256 0.66666667 0.58974359 0.53846154 0.43589744
0.63157895 0.60526316 0.57894737 0.56410256]
mean value: 0.5738866396761133
key: test_roc_auc
value: [0.5 0.5 0.75 0.8125 0.5625 0.75
0.52857143 0.62857143 0.55714286 0.48214286]
mean value: 0.6071428571428571
key: train_roc_auc
value: [0.72322775 0.74528658 0.80392157 0.72869532 0.71776018 0.66647813
0.76506484 0.77364607 0.7532418 0.73132664]
mean value: 0.7408648884655077
key: test_jcc
value: [0. 0.16666667 0.5 0.6 0.2 0.5
0.16666667 0.33333333 0.28571429 0.16666667]
mean value: 0.2919047619047619
key: train_jcc
value: [0.46808511 0.5 0.60465116 0.47916667 0.45652174 0.36956522
0.53333333 0.54761905 0.51162791 0.47826087]
mean value: 0.4948831049856425
MCC on Blind test: 0.04
Accuracy on Blind test: 0.82
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00767875 0.00732279 0.00758123 0.00753927 0.00771689 0.00793982
0.00819755 0.00749803 0.00751305 0.00764227]
mean value: 0.0076629638671875
key: score_time
value: [0.00804496 0.00836205 0.00849819 0.0081315 0.00821066 0.00882912
0.0087676 0.00863767 0.00823522 0.00829029]
mean value: 0.008400726318359374
key: test_mcc
value: [0.42640143 0.40824829 0.11952286 0.81649658 0.63245553 0.83666003
0.35675303 0.52915026 0.11952286 0.41833001]
mean value: 0.4663540894023734
key: train_mcc
value: [0.71777084 0.72240602 0.71777084 0.69776211 0.73774797 0.67769958
0.672375 0.71336904 0.78283392 0.67891024]
mean value: 0.7118645559720945
key: test_accuracy
value: [0.75 0.75 0.58333333 0.91666667 0.83333333 0.91666667
0.66666667 0.75 0.58333333 0.72727273]
mean value: 0.7477272727272727
key: train_accuracy
value: [0.86915888 0.86915888 0.86915888 0.85981308 0.87850467 0.85046729
0.85046729 0.86915888 0.89719626 0.85185185]
mean value: 0.8664935964001385
key: test_fscore
value: [0.4 0.57142857 0.44444444 0.85714286 0.66666667 0.88888889
0.33333333 0.57142857 0.44444444 0.4 ]
mean value: 0.5577777777777778
key: train_fscore
value: [0.79411765 0.78787879 0.79411765 0.7761194 0.8115942 0.75757576
0.75 0.78787879 0.83076923 0.75757576]
mean value: 0.7847627221679594
key: test_precision
value: [1. 0.66666667 0.4 1. 1. 0.8
1. 1. 0.5 1. ]
mean value: 0.8366666666666667
key: train_precision
value: [0.93103448 0.96296296 0.93103448 0.92857143 0.93333333 0.92592593
0.92307692 0.92857143 1. 0.92592593]
mean value: 0.939043689388517
key: test_recall
value: [0.25 0.5 0.5 0.75 0.5 1. 0.2 0.4 0.4 0.25]
mean value: 0.475
key: train_recall
value: [0.69230769 0.66666667 0.69230769 0.66666667 0.71794872 0.64102564
0.63157895 0.68421053 0.71052632 0.64102564]
mean value: 0.6744264507422402
key: test_roc_auc
value: [0.625 0.6875 0.5625 0.875 0.75 0.9375
0.6 0.7 0.55714286 0.625 ]
mean value: 0.6919642857142857
key: train_roc_auc
value: [0.83144796 0.82598039 0.83144796 0.81862745 0.84426848 0.80580694
0.80129672 0.82761251 0.85526316 0.80602007]
mean value: 0.8247771639900459
key: test_jcc
value: [0.25 0.4 0.28571429 0.75 0.5 0.8
0.2 0.4 0.28571429 0.25 ]
mean value: 0.41214285714285714
key: train_jcc
value: [0.65853659 0.65 0.65853659 0.63414634 0.68292683 0.6097561
0.6 0.65 0.71052632 0.6097561 ]
mean value: 0.6464184852374839
MCC on Blind test: 0.16
Accuracy on Blind test: 0.79
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.44463158 0.42616534 0.55910635 0.43400979 0.4541378 0.42967033
0.43932247 0.51193166 0.42841673 0.43269181]
mean value: 0.4560083866119385
key: score_time
value: [0.01107335 0.01112819 0.01112199 0.0153079 0.01128125 0.0111084
0.02174282 0.01112676 0.01113582 0.01421332]
mean value: 0.012923979759216308
key: test_mcc
value: [0.81649658 0.83666003 0. 0.70710678 0.15811388 0.47809144
0.07559289 0.47809144 0.31428571 0.38575837]
mean value: 0.42501971429170726
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.91666667 0.5 0.83333333 0.66666667 0.75
0.58333333 0.75 0.66666667 0.72727273]
mean value: 0.7310606060606061
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.85714286 0.88888889 0.4 0.8 0.33333333 0.66666667
0.28571429 0.66666667 0.6 0.57142857]
mean value: 0.606984126984127
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.8 0.33333333 0.66666667 0.5 0.6
0.5 0.75 0.6 0.66666667]
mean value: 0.6416666666666666
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 1. 0.5 1. 0.25 0.75 0.2 0.6 0.6 0.5 ]
mean value: 0.615
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.9375 0.5 0.875 0.5625 0.75
0.52857143 0.72857143 0.65714286 0.67857143]
mean value: 0.7092857142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75 0.8 0.25 0.66666667 0.2 0.5
0.16666667 0.5 0.42857143 0.4 ]
mean value: 0.4661904761904762
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.07
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01969337 0.00755858 0.00811434 0.00742817 0.0073216 0.00782657
0.00776482 0.00792885 0.00732517 0.00794721]
mean value: 0.008890867233276367
key: score_time
value: [0.01085663 0.00857472 0.00874376 0.0083375 0.00824308 0.00869703
0.00888395 0.00872183 0.00871086 0.00867748]
mean value: 0.008844685554504395
key: test_mcc
value: [0.83666003 0.625 0.81649658 0.81649658 0.83666003 1.
0.50709255 0.84515425 0.65714286 0.81009259]
mean value: 0.7750795466933069
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.83333333 0.91666667 0.91666667 0.91666667 1.
0.75 0.91666667 0.83333333 0.90909091]
mean value: 0.8909090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.75 0.85714286 0.85714286 0.88888889 1.
0.72727273 0.90909091 0.8 0.85714286]
mean value: 0.8535569985569985
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.75 1. 1. 0.8 1.
0.66666667 0.83333333 0.8 1. ]
mean value: 0.865
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.75 0.75 0.75 1. 1. 0.8 1. 0.8 0.75]
mean value: 0.86
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.8125 0.875 0.875 0.9375 1.
0.75714286 0.92857143 0.82857143 0.875 ]
mean value: 0.8826785714285714
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.6 0.75 0.75 0.8 1.
0.57142857 0.83333333 0.66666667 0.75 ]
mean value: 0.7521428571428571
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.83
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.0870738 0.08674216 0.08633971 0.0796845 0.08702898 0.08723402
0.0800159 0.08267999 0.08336306 0.0805757 ]
mean value: 0.08407378196716309
key: score_time
value: [0.01838231 0.01821375 0.01814437 0.01772857 0.01825523 0.01835394
0.01846385 0.01686049 0.01691628 0.0185349 ]
mean value: 0.01798536777496338
key: test_mcc
value: [0.63245553 0.40824829 0.625 1. 0.40824829 0.83666003
0.35675303 0.68313005 0.50709255 0. ]
mean value: 0.5457587777402898
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.75 0.83333333 1. 0.75 0.91666667
0.66666667 0.83333333 0.75 0.63636364]
mean value: 0.796969696969697
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.57142857 0.75 1. 0.57142857 0.88888889
0.33333333 0.75 0.72727273 0. ]
mean value: 0.625901875901876
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.66666667 0.75 1. 0.66666667 0.8
1. 1. 0.66666667 0. ]
mean value: 0.755
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.5 0.75 1. 0.5 1. 0.2 0.6 0.8 0. ]
mean value: 0.585
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.6875 0.8125 1. 0.6875 0.9375
0.6 0.8 0.75714286 0.5 ]
mean value: 0.7532142857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.4 0.6 1. 0.4 0.8
0.2 0.6 0.57142857 0. ]
mean value: 0.5071428571428571
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.16
Accuracy on Blind test: 0.77
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00703287 0.00692582 0.00732732 0.00697088 0.00684643 0.00690222
0.00689054 0.00692654 0.00703335 0.00678849]
mean value: 0.006964445114135742
key: score_time
value: [0.00811267 0.00805044 0.00872636 0.00804448 0.00804257 0.00807214
0.0084269 0.00811172 0.00793386 0.00806904]
mean value: 0.008159017562866211
key: test_mcc
value: [ 0.63245553 0.63245553 0.25 0. 0.625 0.15811388
0.47809144 0.29277002 -0.23904572 0.38575837]
mean value: 0.3215599065732439
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.83333333 0.83333333 0.66666667 0.58333333 0.83333333 0.66666667
0.75 0.66666667 0.41666667 0.72727273]
mean value: 0.6977272727272728
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.66666667 0.5 0.28571429 0.75 0.33333333
0.66666667 0.5 0.22222222 0.57142857]
mean value: 0.5162698412698412
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 1. 0.5 0.33333333 0.75 0.5
0.75 0.66666667 0.25 0.66666667]
mean value: 0.6416666666666666
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.5 0.5 0.25 0.75 0.25 0.6 0.4 0.2 0.5 ]
mean value: 0.445
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.75 0.625 0.5 0.8125 0.5625
0.72857143 0.62857143 0.38571429 0.67857143]
mean value: 0.6421428571428571
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.5 0.33333333 0.16666667 0.6 0.2
0.5 0.33333333 0.125 0.4 ]
mean value: 0.36583333333333334
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.6
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [0.9999063 0.96824765 0.96530747 1.01568055 1.00144243 1.0073843
1.0299902 0.9734323 0.96623063 0.96925235]
mean value: 0.9896874189376831
key: score_time
value: [0.08913732 0.08848977 0.09147906 0.0936265 0.09639764 0.09607625
0.08916879 0.08925462 0.08908725 0.08973861]
mean value: 0.09124557971954346
key: test_mcc
value: [1. 0.625 0.625 1. 0.40824829 1.
0.83666003 0.65714286 0.65714286 0.81009259]
mean value: 0.7619286618584635
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.83333333 0.83333333 1. 0.75 1.
0.91666667 0.83333333 0.83333333 0.90909091]
mean value: 0.8909090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.75 0.75 1. 0.57142857 1.
0.88888889 0.8 0.8 0.85714286]
mean value: 0.8417460317460318
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.75 0.75 1. 0.66666667 1.
1. 0.8 0.8 1. ]
mean value: 0.8766666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.75 0.75 1. 0.5 1. 0.8 0.8 0.8 0.75]
mean value: 0.8150000000000001
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.8125 0.8125 1. 0.6875 1.
0.9 0.82857143 0.82857143 0.875 ]
mean value: 0.8744642857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.6 0.6 1. 0.4 1.
0.8 0.66666667 0.66666667 0.75 ]
mean value: 0.7483333333333333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.86
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
key: fit_time
value: [1.706635 0.86073232 0.898561 0.83530664 0.94017434 0.93120837
0.82625723 0.85147214 0.83630848 0.80121708]
mean value: 0.948787260055542
key: score_time
value: [0.23586416 0.21526957 0.2305944 0.21968794 0.23709798 0.14258313
0.17830396 0.22208166 0.23773837 0.23715353]
mean value: 0.215637469291687
key: test_mcc
value: [0.81649658 0.625 0.625 0.81649658 0.63245553 1.
0.52915026 0.47809144 0.68313005 0.81009259]
mean value: 0.701591303820076
key: train_mcc
value: [0.94025192 0.9600061 0.94025192 0.9600061 0.9600061 0.9600061
0.95952175 0.95952175 0.97968078 0.94053994]
mean value: 0.9559792483395588
key: test_accuracy
value: [0.91666667 0.83333333 0.83333333 0.91666667 0.83333333 1.
0.75 0.75 0.83333333 0.90909091]
mean value: 0.8575757575757575
key: train_accuracy
value: [0.97196262 0.98130841 0.97196262 0.98130841 0.98130841 0.98130841
0.98130841 0.98130841 0.99065421 0.97222222]
mean value: 0.9794652128764278
key: test_fscore
value: [0.85714286 0.75 0.75 0.85714286 0.66666667 1.
0.57142857 0.66666667 0.75 0.85714286]
mean value: 0.7726190476190475
key: train_fscore
value: [0.96 0.97368421 0.96 0.97368421 0.97368421 0.97368421
0.97297297 0.97297297 0.98666667 0.96 ]
mean value: 0.9707349454717876
key: test_precision
value: [1. 0.75 0.75 1. 1. 1. 1. 0.75 1. 1. ]
mean value: 0.925
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.75 0.75 0.75 0.5 1. 0.4 0.6 0.6 0.75]
mean value: 0.685
key: train_recall
value: [0.92307692 0.94871795 0.92307692 0.94871795 0.94871795 0.94871795
0.94736842 0.94736842 0.97368421 0.92307692]
mean value: 0.9432523616734143
key: test_roc_auc
value: [0.875 0.8125 0.8125 0.875 0.75 1.
0.7 0.72857143 0.8 0.875 ]
mean value: 0.8228571428571428
key: train_roc_auc
value: [0.96153846 0.97435897 0.96153846 0.97435897 0.97435897 0.97435897
0.97368421 0.97368421 0.98684211 0.96153846]
mean value: 0.9716261808367072
key: test_jcc
value: [0.75 0.6 0.6 0.75 0.5 1. 0.4 0.5 0.6 0.75]
mean value: 0.645
key: train_jcc
value: [0.92307692 0.94871795 0.92307692 0.94871795 0.94871795 0.94871795
0.94736842 0.94736842 0.97368421 0.92307692]
mean value: 0.9432523616734143
MCC on Blind test: 0.14
Accuracy on Blind test: 0.87
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01675677 0.00714898 0.0082562 0.0074234 0.00729084 0.00770926
0.00744081 0.00784683 0.00782228 0.00789189]
mean value: 0.00855872631072998
key: score_time
value: [0.01316333 0.00809813 0.0098176 0.00799656 0.00866699 0.0089457
0.00823903 0.00895739 0.00873828 0.0089376 ]
mean value: 0.009156060218811036
key: test_mcc
value: [ 0. 0.25 -0.23904572 0.47809144 0.40824829 0.
-0.09759001 0.52915026 0.31428571 0.38575837]
mean value: 0.20288983564397506
key: train_mcc
value: [0.4754902 0.50673892 0.4653488 0.44239297 0.48817818 0.50337256
0.39242808 0.39534618 0.48161946 0.37522992]
mean value: 0.4526145268371993
key: test_accuracy
value: [0.66666667 0.66666667 0.41666667 0.75 0.75 0.41666667
0.5 0.75 0.66666667 0.72727273]
mean value: 0.6310606060606061
key: train_accuracy
value: [0.75700935 0.77570093 0.75700935 0.74766355 0.76635514 0.77570093
0.72897196 0.71962617 0.76635514 0.72222222]
mean value: 0.7516614745586708
key: test_fscore
value: [0. 0.5 0.22222222 0.66666667 0.57142857 0.46153846
0.25 0.57142857 0.6 0.57142857]
mean value: 0.4414713064713065
key: train_fscore
value: [0.66666667 0.67567568 0.64864865 0.63013699 0.66666667 0.66666667
0.5915493 0.61538462 0.65753425 0.57142857]
mean value: 0.6390358039788872
key: test_precision
value: [0. 0.5 0.2 0.6 0.66666667 0.33333333
0.33333333 1. 0.6 0.66666667]
mean value: 0.49
key: train_precision
value: [0.66666667 0.71428571 0.68571429 0.67647059 0.69444444 0.72727273
0.63636364 0.6 0.68571429 0.64516129]
mean value: 0.6732093639019635
key: test_recall
value: [0. 0.5 0.25 0.75 0.5 0.75 0.2 0.4 0.6 0.5 ]
mean value: 0.445
key: train_recall
value: [0.66666667 0.64102564 0.61538462 0.58974359 0.64102564 0.61538462
0.55263158 0.63157895 0.63157895 0.51282051]
mean value: 0.6097840755735493
key: test_roc_auc
value: [0.5 0.625 0.375 0.75 0.6875 0.5
0.45714286 0.7 0.65714286 0.67857143]
mean value: 0.5930357142857143
key: train_roc_auc
value: [0.7377451 0.74698341 0.72680995 0.71398944 0.73963047 0.74151584
0.68935927 0.69984744 0.73607933 0.67670011]
mean value: 0.7208660360817448
key: test_jcc
value: [0. 0.33333333 0.125 0.5 0.4 0.3
0.14285714 0.4 0.42857143 0.4 ]
mean value: 0.30297619047619045
key: train_jcc
value: [0.5 0.51020408 0.48 0.46 0.5 0.5
0.42 0.44444444 0.48979592 0.4 ]
mean value: 0.47044444444444444
MCC on Blind test: 0.14
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.07072282 0.03617072 0.03579974 0.03728676 0.03555346 0.03524041
0.03362727 0.03646469 0.05544496 0.09033132]
mean value: 0.04666421413421631
key: score_time
value: [0.01112819 0.0112555 0.01102757 0.01083326 0.01092458 0.01118302
0.01061773 0.01085544 0.00988364 0.0103364 ]
mean value: 0.010804533958435059
key: test_mcc
value: [1. 0.625 0.81649658 0.81649658 0.83666003 1.
0.65714286 1. 0.65714286 0.81009259]
mean value: 0.8219031489976224
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.83333333 0.91666667 0.91666667 0.91666667 1.
0.83333333 1. 0.83333333 0.90909091]
mean value: 0.9159090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.75 0.85714286 0.85714286 0.88888889 1.
0.8 1. 0.8 0.85714286]
mean value: 0.881031746031746
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.75 1. 1. 0.8 1. 0.8 1. 0.8 1. ]
mean value: 0.915
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.75 0.75 0.75 1. 1. 0.8 1. 0.8 0.75]
mean value: 0.86
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.8125 0.875 0.875 0.9375 1.
0.82857143 1. 0.82857143 0.875 ]
mean value: 0.9032142857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.6 0.75 0.75 0.8 1.
0.66666667 1. 0.66666667 0.75 ]
mean value: 0.7983333333333333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.84
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01759219 0.01122999 0.01159906 0.01160884 0.011446 0.0116322
0.01226497 0.01147413 0.01333094 0.01733923]
mean value: 0.01295175552368164
key: score_time
value: [0.01068926 0.01071072 0.01065826 0.01099992 0.01074338 0.01079488
0.01092148 0.01075888 0.01078367 0.01083136]
mean value: 0.010789179801940918
key: test_mcc
value: [ 0.625 0.25 0.35355339 0.83666003 0.83666003 0.83666003
0.65714286 1. 0.71428571 -0.17857143]
mean value: 0.5931390613052643
key: train_mcc
value: [0.90236159 0.96085507 0.96085507 0.90236159 0.93999796 0.92091277
0.92008523 0.92008523 0.92008523 0.96106604]
mean value: 0.9308665771065557
key: test_accuracy
value: [0.83333333 0.66666667 0.66666667 0.91666667 0.91666667 0.91666667
0.83333333 1. 0.83333333 0.45454545]
mean value: 0.8037878787878788
key: train_accuracy
value: [0.95327103 0.98130841 0.98130841 0.95327103 0.97196262 0.96261682
0.96261682 0.96261682 0.96261682 0.98148148]
mean value: 0.9673070266528211
key: test_fscore
value: [0.75 0.5 0.6 0.88888889 0.88888889 0.88888889
0.8 1. 0.83333333 0.25 ]
mean value: 0.74
key: train_fscore
value: [0.9382716 0.975 0.975 0.9382716 0.96202532 0.95
0.94871795 0.94871795 0.94871795 0.975 ]
mean value: 0.9559722372486086
key: test_precision
value: [0.75 0.5 0.5 0.8 0.8 0.8
0.8 1. 0.71428571 0.25 ]
mean value: 0.6914285714285715
key: train_precision
value: [0.9047619 0.95121951 0.95121951 0.9047619 0.95 0.92682927
0.925 0.925 0.925 0.95121951]
mean value: 0.9315011614401858
key: test_recall
value: [0.75 0.5 0.75 1. 1. 1. 0.8 1. 1. 0.25]
mean value: 0.805
key: train_recall
value: [0.97435897 1. 1. 0.97435897 0.97435897 0.97435897
0.97368421 0.97368421 0.97368421 1. ]
mean value: 0.9818488529014845
key: test_roc_auc
value: [0.8125 0.625 0.6875 0.9375 0.9375 0.9375
0.82857143 1. 0.85714286 0.41071429]
mean value: 0.8033928571428571
key: train_roc_auc
value: [0.95776772 0.98529412 0.98529412 0.95776772 0.9724736 0.96512066
0.96510297 0.96510297 0.96510297 0.98550725]
mean value: 0.9704534119579886
key: test_jcc
value: [0.6 0.33333333 0.42857143 0.8 0.8 0.8
0.66666667 1. 0.71428571 0.14285714]
mean value: 0.6285714285714286
key: train_jcc
value: [0.88372093 0.95121951 0.95121951 0.88372093 0.92682927 0.9047619
0.90243902 0.90243902 0.90243902 0.95121951]
mean value: 0.9160008643275801
MCC on Blind test: 0.07
Accuracy on Blind test: 0.69
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02494788 0.01585054 0.00776625 0.00726771 0.00712013 0.00683975
0.00691533 0.00710177 0.00688958 0.00703335]
mean value: 0.00977323055267334
key: score_time
value: [0.01840568 0.0093627 0.00875354 0.00817943 0.00809598 0.00807095
0.00832725 0.00807333 0.00806046 0.00866556]
mean value: 0.009399485588073731
key: test_mcc
value: [0.42640143 0.40824829 0.11952286 0.47809144 0.15811388 0.35355339
0.35675303 0.29277002 0.47809144 0.41833001]
mean value: 0.3489875814335667
key: train_mcc
value: [0.45416735 0.52159509 0.45416735 0.49964579 0.52383566 0.43117964
0.47315489 0.49023798 0.44470372 0.45631672]
mean value: 0.4749004183145091
key: test_accuracy
value: [0.75 0.75 0.58333333 0.75 0.66666667 0.66666667
0.66666667 0.66666667 0.75 0.72727273]
mean value: 0.6977272727272728
key: train_accuracy
value: [0.75700935 0.78504673 0.75700935 0.77570093 0.78504673 0.74766355
0.76635514 0.77570093 0.75700935 0.75925926]
mean value: 0.7665801315334025
key: test_fscore
value: [0.4 0.57142857 0.44444444 0.66666667 0.33333333 0.6
0.33333333 0.5 0.66666667 0.4 ]
mean value: 0.4915873015873016
key: train_fscore
value: [0.60606061 0.66666667 0.60606061 0.625 0.63492063 0.58461538
0.63768116 0.625 0.59375 0.60606061]
mean value: 0.6185815663804795
key: test_precision
value: [1. 0.66666667 0.4 0.6 0.5 0.5
1. 0.66666667 0.75 1. ]
mean value: 0.7083333333333334
key: train_precision
value: [0.74074074 0.76666667 0.74074074 0.8 0.83333333 0.73076923
0.70967742 0.76923077 0.73076923 0.74074074]
mean value: 0.7562668872346292
key: test_recall
value: [0.25 0.5 0.5 0.75 0.25 0.75 0.2 0.4 0.6 0.25]
mean value: 0.445
key: train_recall
value: [0.51282051 0.58974359 0.51282051 0.51282051 0.51282051 0.48717949
0.57894737 0.52631579 0.5 0.51282051]
mean value: 0.5246288798920378
key: test_roc_auc
value: [0.625 0.6875 0.5625 0.75 0.5625 0.6875
0.6 0.62857143 0.72857143 0.625 ]
mean value: 0.6457142857142857
key: train_roc_auc
value: [0.70493967 0.74340121 0.70493967 0.71964555 0.72699849 0.69211916
0.72425629 0.71967963 0.69927536 0.70568562]
mean value: 0.7140940648394546
key: test_jcc
value: [0.25 0.4 0.28571429 0.5 0.2 0.42857143
0.2 0.33333333 0.5 0.25 ]
mean value: 0.33476190476190476
key: train_jcc
value: [0.43478261 0.5 0.43478261 0.45454545 0.46511628 0.41304348
0.46808511 0.45454545 0.42222222 0.43478261]
mean value: 0.44819058211137036
MCC on Blind test: 0.14
Accuracy on Blind test: 0.75
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00787282 0.00709176 0.00747252 0.00727296 0.00753498 0.0074842
0.00761414 0.00750947 0.00763249 0.00774121]
mean value: 0.00752265453338623
key: score_time
value: [0.00790644 0.00816011 0.00783634 0.00811148 0.00789452 0.00806546
0.00857043 0.00817347 0.00801802 0.00889587]
mean value: 0.008163213729858398
key: test_mcc
value: [1. 0.625 0.11952286 0.70710678 0.47809144 0.70710678
0.37142857 0.84515425 0.29277002 0.60714286]
mean value: 0.5753323572224797
key: train_mcc
value: [0.8165399 0.85945065 0.82420912 0.82726738 0.76153359 0.83287099
0.79235477 0.84830731 0.84110073 0.8789655 ]
mean value: 0.8282599941345357
key: test_accuracy
value: [1. 0.83333333 0.58333333 0.83333333 0.75 0.83333333
0.66666667 0.91666667 0.66666667 0.81818182]
mean value: 0.7901515151515152
key: train_accuracy
value: [0.90654206 0.93457944 0.91588785 0.91588785 0.87850467 0.91588785
0.88785047 0.92523364 0.92523364 0.94444444]
mean value: 0.9150051921079958
key: test_fscore
value: [1. 0.75 0.44444444 0.8 0.66666667 0.8
0.66666667 0.90909091 0.5 0.75 ]
mean value: 0.7286868686868687
key: train_fscore
value: [0.88372093 0.90410959 0.86956522 0.89156627 0.85057471 0.89411765
0.86363636 0.90243902 0.88235294 0.92105263]
mean value: 0.8863135322209726
key: test_precision
value: [1. 0.75 0.4 0.66666667 0.6 0.66666667
0.57142857 0.83333333 0.66666667 0.75 ]
mean value: 0.6904761904761905
key: train_precision
value: [0.80851064 0.97058824 1. 0.84090909 0.77083333 0.82608696
0.76 0.84090909 1. 0.94594595]
mean value: 0.876378329121119
key: test_recall
value: [1. 0.75 0.5 1. 0.75 1. 0.8 1. 0.4 0.75]
mean value: 0.795
key: train_recall
value: [0.97435897 0.84615385 0.76923077 0.94871795 0.94871795 0.97435897
1. 0.97368421 0.78947368 0.8974359 ]
mean value: 0.9122132253711202
key: test_roc_auc
value: [1. 0.8125 0.5625 0.875 0.75 0.875
0.68571429 0.92857143 0.62857143 0.80357143]
mean value: 0.7921428571428571
key: train_roc_auc
value: [0.92100302 0.91572398 0.88461538 0.92288839 0.89347662 0.92835596
0.91304348 0.93611747 0.89473684 0.9342252 ]
mean value: 0.9144186331459181
key: test_jcc
value: [1. 0.6 0.28571429 0.66666667 0.5 0.66666667
0.5 0.83333333 0.33333333 0.6 ]
mean value: 0.5985714285714285
key: train_jcc
value: [0.79166667 0.825 0.76923077 0.80434783 0.74 0.80851064
0.76 0.82222222 0.78947368 0.85365854]
mean value: 0.7964110343300379
MCC on Blind test: 0.04
Accuracy on Blind test: 0.82
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00992465 0.00924659 0.00704741 0.00705194 0.00760007 0.00777459
0.00699854 0.00773525 0.00748038 0.00777173]
mean value: 0.007863116264343262
key: score_time
value: [0.01014447 0.00912404 0.00799847 0.00816655 0.00813746 0.00799036
0.00796533 0.00817347 0.00830007 0.00831962]
mean value: 0.00843198299407959
key: test_mcc
value: [1. 0.40824829 0.40824829 0.625 0.81649658 0.625
0.83666003 0.65714286 0.23904572 0.41833001]
mean value: 0.6034171780666301
key: train_mcc
value: [0.8720951 0.89986237 0.74811148 0.77945561 0.86259524 0.6717753
0.93862091 0.88019137 0.69504805 0.78691217]
mean value: 0.8134667600062208
key: test_accuracy
value: [1. 0.75 0.75 0.83333333 0.91666667 0.83333333
0.91666667 0.83333333 0.58333333 0.72727273]
mean value: 0.8143939393939394
key: train_accuracy
value: [0.93457944 0.95327103 0.87850467 0.89719626 0.93457944 0.8411215
0.97196262 0.94392523 0.82242991 0.89814815]
mean value: 0.9075718241606092
key: test_fscore
value: [1. 0.57142857 0.57142857 0.75 0.85714286 0.75
0.88888889 0.8 0.61538462 0.4 ]
mean value: 0.7204273504273505
key: train_fscore
value: [0.91764706 0.93670886 0.8 0.86075949 0.91358025 0.72131148
0.96 0.91428571 0.8 0.8358209 ]
mean value: 0.8660113745385427
key: test_precision
value: [1. 0.66666667 0.66666667 0.75 1. 0.75
1. 0.8 0.5 1. ]
mean value: 0.8133333333333334
key: train_precision
value: [0.84782609 0.925 1. 0.85 0.88095238 1.
0.97297297 1. 0.66666667 1. ]
mean value: 0.9143418107548542
key: test_recall
value: [1. 0.5 0.5 0.75 0.75 0.75 0.8 0.8 0.8 0.25]
mean value: 0.6900000000000001
key: train_recall
value: [1. 0.94871795 0.66666667 0.87179487 0.94871795 0.56410256
0.94736842 0.84210526 1. 0.71794872]
mean value: 0.8507422402159244
key: test_roc_auc
value: [1. 0.6875 0.6875 0.8125 0.875 0.8125
0.9 0.82857143 0.61428571 0.625 ]
mean value: 0.7842857142857143
key: train_roc_auc
value: [0.94852941 0.95230015 0.83333333 0.89177979 0.93759427 0.78205128
0.96643783 0.92105263 0.86231884 0.85897436]
mean value: 0.8954371900141855
key: test_jcc
value: [1. 0.4 0.4 0.6 0.75 0.6
0.8 0.66666667 0.44444444 0.25 ]
mean value: 0.5911111111111111
key: train_jcc
value: [0.84782609 0.88095238 0.66666667 0.75555556 0.84090909 0.56410256
0.92307692 0.84210526 0.66666667 0.71794872]
mean value: 0.7705809915992983
MCC on Blind test: 0.07
Accuracy on Blind test: 0.91
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.07426667 0.06154084 0.06283951 0.06345463 0.06271338 0.06455684
0.06105089 0.06551576 0.06561875 0.06309128]
mean value: 0.0644648551940918
key: score_time
value: [0.01463723 0.01418447 0.01481771 0.01499844 0.01489115 0.01537299
0.01565957 0.01581383 0.01552248 0.01574159]
mean value: 0.015163946151733398
key: test_mcc
value: [1. 0.625 0.625 0.625 0.83666003 1.
0.52915026 1. 0.65714286 0.81009259]
mean value: 0.7708045733190834
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.83333333 0.83333333 0.83333333 0.91666667 1.
0.75 1. 0.83333333 0.90909091]
mean value: 0.8909090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.75 0.75 0.75 0.88888889 1.
0.57142857 1. 0.8 0.85714286]
mean value: 0.8367460317460318
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.75 0.75 0.75 0.8 1. 1. 1. 0.8 1. ]
mean value: 0.885
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.75 0.75 0.75 1. 1. 0.4 1. 0.8 0.75]
mean value: 0.8200000000000001
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.8125 0.8125 0.8125 0.9375 1.
0.7 1. 0.82857143 0.875 ]
mean value: 0.8778571428571429
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.6 0.6 0.6 0.8 1.
0.4 1. 0.66666667 0.75 ]
mean value: 0.7416666666666667
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.78
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.02740383 0.02774906 0.03296709 0.04285073 0.033988 0.02595377
0.04620218 0.03556585 0.02733755 0.03265309]
mean value: 0.03326711654663086
key: score_time
value: [0.02362061 0.02275753 0.03781056 0.03313112 0.02986407 0.02139044
0.03054595 0.02594328 0.02176881 0.02311182]
mean value: 0.02699441909790039
key: test_mcc
value: [0.83666003 0.625 0.81649658 1. 0.83666003 1.
0.83666003 1. 0.65714286 0.81009259]
mean value: 0.8418712104973792
key: train_mcc
value: [1. 0.97991726 1. 0.97991726 0.97991726 1.
1. 1. 1. 0.98002018]
mean value: 0.9919771953521386
key: test_accuracy
value: [0.91666667 0.83333333 0.91666667 1. 0.91666667 1.
0.91666667 1. 0.83333333 0.90909091]
mean value: 0.9242424242424242
key: train_accuracy
value: [1. 0.99065421 1. 0.99065421 0.99065421 1.
1. 1. 1. 0.99074074]
mean value: 0.9962703357563171
key: test_fscore
value: [0.88888889 0.75 0.85714286 1. 0.88888889 1.
0.88888889 1. 0.8 0.85714286]
mean value: 0.8930952380952382
key: train_fscore
value: [1. 0.98701299 1. 0.98701299 0.98701299 1.
1. 1. 1. 0.98701299]
mean value: 0.9948051948051948
key: test_precision
value: [0.8 0.75 1. 1. 0.8 1. 1. 1. 0.8 1. ]
mean value: 0.915
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.75 0.75 1. 1. 1. 0.8 1. 0.8 0.75]
mean value: 0.885
key: train_recall
value: [1. 0.97435897 1. 0.97435897 0.97435897 1.
1. 1. 1. 0.97435897]
mean value: 0.9897435897435898
key: test_roc_auc
value: [0.9375 0.8125 0.875 1. 0.9375 1.
0.9 1. 0.82857143 0.875 ]
mean value: 0.9166071428571428
key: train_roc_auc
value: [1. 0.98717949 1. 0.98717949 0.98717949 1.
1. 1. 1. 0.98717949]
mean value: 0.9948717948717949
key: test_jcc
value: [0.8 0.6 0.75 1. 0.8 1.
0.8 1. 0.66666667 0.75 ]
mean value: 0.8166666666666667
key: train_jcc
value: [1. 0.97435897 1. 0.97435897 0.97435897 1.
1. 1. 1. 0.97435897]
mean value: 0.9897435897435898
MCC on Blind test: 0.13
Accuracy on Blind test: 0.86
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.02972174 0.03583384 0.03609109 0.03561258 0.0359695 0.03587818
0.03609157 0.03597355 0.03254533 0.03658676]
mean value: 0.0350304126739502
key: score_time
value: [0.02094769 0.02031541 0.02000165 0.02015448 0.01970243 0.01941609
0.01103115 0.02209592 0.0211103 0.02550483]
mean value: 0.020027995109558105
key: test_mcc
value: [0.42640143 0.15811388 0.40824829 0.63245553 0.15811388 0.40824829
0.35675303 0.07559289 0.11952286 0. ]
mean value: 0.27434501012310836
key: train_mcc
value: [0.94025192 0.94025192 0.97991726 0.92064018 0.92064018 0.92064018
0.93950808 0.93950808 0.93950808 0.94053994]
mean value: 0.9381405840047681
key: test_accuracy
value: [0.75 0.66666667 0.75 0.83333333 0.66666667 0.75
0.66666667 0.58333333 0.58333333 0.63636364]
mean value: 0.6886363636363636
key: train_accuracy
value: [0.97196262 0.97196262 0.99065421 0.96261682 0.96261682 0.96261682
0.97196262 0.97196262 0.97196262 0.97222222]
mean value: 0.9710539979231568
key: test_fscore
value: [0.4 0.33333333 0.57142857 0.66666667 0.33333333 0.57142857
0.33333333 0.28571429 0.44444444 0. ]
mean value: 0.39396825396825397
key: train_fscore
value: [0.96 0.96 0.98701299 0.94594595 0.94594595 0.94594595
0.95890411 0.95890411 0.95890411 0.96 ]
mean value: 0.9581563153617948
key: test_precision
value: [1. 0.5 0.66666667 1. 0.5 0.66666667
1. 0.5 0.5 0. ]
mean value: 0.6333333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.25 0.25 0.5 0.5 0.25 0.5 0.2 0.2 0.4 0. ]
mean value: 0.305
key: train_recall
value: [0.92307692 0.92307692 0.97435897 0.8974359 0.8974359 0.8974359
0.92105263 0.92105263 0.92105263 0.92307692]
mean value: 0.9199055330634278
key: test_roc_auc
value: [0.625 0.5625 0.6875 0.75 0.5625 0.6875
0.6 0.52857143 0.55714286 0.5 ]
mean value: 0.6060714285714286
key: train_roc_auc
value: [0.96153846 0.96153846 0.98717949 0.94871795 0.94871795 0.94871795
0.96052632 0.96052632 0.96052632 0.96153846]
mean value: 0.9599527665317139
key: test_jcc
value: [0.25 0.2 0.4 0.5 0.2 0.4
0.2 0.16666667 0.28571429 0. ]
mean value: 0.26023809523809527
key: train_jcc
value: [0.92307692 0.92307692 0.97435897 0.8974359 0.8974359 0.8974359
0.92105263 0.92105263 0.92105263 0.92307692]
mean value: 0.9199055330634278
MCC on Blind test: 0.05
Accuracy on Blind test: 0.85
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.08681178 0.08099294 0.0833962 0.08435941 0.08145213 0.08759737
0.08979297 0.08592725 0.08576846 0.07664156]
mean value: 0.08427400588989258
key: score_time
value: [0.00886655 0.00919652 0.00933671 0.00866079 0.00926757 0.00908661
0.00926757 0.0088954 0.00944066 0.0094223 ]
mean value: 0.009144067764282227
key: test_mcc
value: [0.83666003 0.625 0.81649658 0.81649658 0.83666003 0.83666003
0.65714286 0.84515425 0.65714286 0.81009259]
mean value: 0.7737505797772892
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91666667 0.83333333 0.91666667 0.91666667 0.91666667 0.91666667
0.83333333 0.91666667 0.83333333 0.90909091]
mean value: 0.8909090909090909
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.75 0.85714286 0.85714286 0.88888889 0.88888889
0.8 0.90909091 0.8 0.85714286]
mean value: 0.8497186147186148
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.75 1. 1. 0.8 0.8
0.8 0.83333333 0.8 1. ]
mean value: 0.8583333333333334
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.75 0.75 0.75 1. 1. 0.8 1. 0.8 0.75]
mean value: 0.86
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.8125 0.875 0.875 0.9375 0.9375
0.82857143 0.92857143 0.82857143 0.875 ]
mean value: 0.8835714285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.6 0.75 0.75 0.8 0.8
0.66666667 0.83333333 0.66666667 0.75 ]
mean value: 0.7416666666666667
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.82
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00962162 0.01063228 0.01091313 0.01079535 0.01196051 0.0170722
0.01151824 0.01096225 0.02570939 0.01203942]
mean value: 0.01312243938446045
key: score_time
value: [0.01107264 0.01098132 0.01062155 0.01118159 0.01129007 0.01117349
0.01122856 0.01094556 0.01141524 0.01125026]
mean value: 0.01111602783203125
key: test_mcc
value: [0. 0. 0. 0. 0. 0.
0. 0.07559289 0. 0. ]
mean value: 0.007559289460184544
key: train_mcc
value: [0.32183783 0.32183783 0.32183783 0.18223949 0.26021572 0.26021572
0.32843368 0.32843368 0.29834424 0.29306141]
mean value: 0.2916457442021758
key: test_accuracy
value: [0.66666667 0.66666667 0.66666667 0.66666667 0.66666667 0.66666667
0.58333333 0.58333333 0.58333333 0.63636364]
mean value: 0.6386363636363637
key: train_accuracy
value: [0.69158879 0.69158879 0.69158879 0.65420561 0.6728972 0.6728972
0.70093458 0.70093458 0.69158879 0.68518519]
mean value: 0.6853409484250605
key: test_fscore
value: [0. 0. 0. 0. 0. 0.
0. 0.28571429 0. 0. ]
mean value: 0.028571428571428574
key: train_fscore
value: [0.26666667 0.26666667 0.26666667 0.09756098 0.18604651 0.18604651
0.27272727 0.27272727 0.23255814 0.22727273]
mean value: 0.22749394111277266
key: test_precision
value: [0. 0. 0. 0. 0. 0. 0. 0.5 0. 0. ]
mean value: 0.05
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0. 0. 0. 0. 0. 0. 0. 0.2 0. 0. ]
mean value: 0.02
key: train_recall
value: [0.15384615 0.15384615 0.15384615 0.05128205 0.1025641 0.1025641
0.15789474 0.15789474 0.13157895 0.12820513]
mean value: 0.12935222672064778
key: test_roc_auc
value: [0.5 0.5 0.5 0.5 0.5 0.5
0.5 0.52857143 0.5 0.5 ]
mean value: 0.5028571428571429
key: train_roc_auc
value: [0.57692308 0.57692308 0.57692308 0.52564103 0.55128205 0.55128205
0.57894737 0.57894737 0.56578947 0.56410256]
mean value: 0.5646761133603239
key: test_jcc
value: [0. 0. 0. 0. 0. 0.
0. 0.16666667 0. 0. ]
mean value: 0.016666666666666666
key: train_jcc
value: [0.15384615 0.15384615 0.15384615 0.05128205 0.1025641 0.1025641
0.15789474 0.15789474 0.13157895 0.12820513]
mean value: 0.12935222672064778
MCC on Blind test: -0.02
Accuracy on Blind test: 0.95
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.0105691 0.01015902 0.00814056 0.00781894 0.00773811 0.00766277
0.00809789 0.0085144 0.00822377 0.008322 ]
mean value: 0.008524656295776367
key: score_time
value: [0.01082993 0.00936484 0.00863528 0.00822663 0.00833321 0.00819302
0.0086484 0.00855327 0.00863814 0.00825906]
mean value: 0.008768177032470703
key: test_mcc
value: [0.63245553 0.40824829 0.35355339 1. 0.625 0.70710678
0.68313005 0.83666003 0.31428571 0.69006556]
mean value: 0.6250505345503478
key: train_mcc
value: [0.79826546 0.8375252 0.89876312 0.85818605 0.85972678 0.87895928
0.81760898 0.83676583 0.89756105 0.83946488]
mean value: 0.8522826622791292
key: test_accuracy
value: [0.83333333 0.75 0.66666667 1. 0.83333333 0.83333333
0.83333333 0.91666667 0.66666667 0.81818182]
mean value: 0.8151515151515152
key: train_accuracy
value: [0.90654206 0.92523364 0.95327103 0.93457944 0.93457944 0.94392523
0.91588785 0.92523364 0.95327103 0.92592593]
mean value: 0.9318449290411908
key: test_fscore
value: [0.66666667 0.57142857 0.6 1. 0.75 0.8
0.75 0.88888889 0.6 0.8 ]
mean value: 0.7426984126984127
key: train_fscore
value: [0.87179487 0.89473684 0.93506494 0.90909091 0.91139241 0.92307692
0.88311688 0.89473684 0.93333333 0.8974359 ]
mean value: 0.905377984218757
key: test_precision
value: [1. 0.66666667 0.5 1. 0.75 0.66666667
1. 1. 0.6 0.66666667]
mean value: 0.785
key: train_precision
value: [0.87179487 0.91891892 0.94736842 0.92105263 0.9 0.92307692
0.87179487 0.89473684 0.94594595 0.8974359 ]
mean value: 0.9092125323704271
key: test_recall
value: [0.5 0.5 0.75 1. 0.75 1. 0.6 0.8 0.6 1. ]
mean value: 0.75
key: train_recall
value: [0.87179487 0.87179487 0.92307692 0.8974359 0.92307692 0.92307692
0.89473684 0.89473684 0.92105263 0.8974359 ]
mean value: 0.9018218623481782
key: test_roc_auc
value: [0.75 0.6875 0.6875 1. 0.8125 0.875
0.8 0.9 0.65714286 0.85714286]
mean value: 0.8026785714285715
key: train_roc_auc
value: [0.89913273 0.91383861 0.94683258 0.92665913 0.9321267 0.93947964
0.91113654 0.91838291 0.94603356 0.91973244]
mean value: 0.9253354836037566
key: test_jcc
value: [0.5 0.4 0.42857143 1. 0.6 0.66666667
0.6 0.8 0.42857143 0.66666667]
mean value: 0.6090476190476191
key: train_jcc
value: [0.77272727 0.80952381 0.87804878 0.83333333 0.8372093 0.85714286
0.79069767 0.80952381 0.875 0.81395349]
mean value: 0.8277160327855166
MCC on Blind test: 0.08
Accuracy on Blind test: 0.74
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.07413816 0.06058931 0.06105781 0.06061363 0.0620904 0.06173635
0.06109071 0.06248355 0.06057048 0.06052804]
mean value: 0.06248984336853027
key: score_time
value: [0.00838947 0.00880098 0.00829411 0.0082829 0.00848746 0.00843048
0.00848484 0.00820589 0.00828862 0.00822687]
mean value: 0.008389163017272949
key: test_mcc
value: [0.63245553 0.40824829 0.35355339 1. 0.625 0.70710678
0.68313005 0.83666003 0.31428571 0.69006556]
mean value: 0.6250505345503478
key: train_mcc
value: [0.79826546 0.8375252 0.89876312 0.85818605 0.85972678 0.87895928
0.81760898 0.83676583 0.89756105 0.83946488]
mean value: 0.8522826622791292
key: test_accuracy
value: [0.83333333 0.75 0.66666667 1. 0.83333333 0.83333333
0.83333333 0.91666667 0.66666667 0.81818182]
mean value: 0.8151515151515152
key: train_accuracy
value: [0.90654206 0.92523364 0.95327103 0.93457944 0.93457944 0.94392523
0.91588785 0.92523364 0.95327103 0.92592593]
mean value: 0.9318449290411908
key: test_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:122: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.66666667 0.57142857 0.6 1. 0.75 0.8
0.75 0.88888889 0.6 0.8 ]
mean value: 0.7426984126984127
key: train_fscore
value: [0.87179487 0.89473684 0.93506494 0.90909091 0.91139241 0.92307692
0.88311688 0.89473684 0.93333333 0.8974359 ]
mean value: 0.905377984218757
key: test_precision
value: [1. 0.66666667 0.5 1. 0.75 0.66666667
1. 1. 0.6 0.66666667]
mean value: 0.785
key: train_precision
value: [0.87179487 0.91891892 0.94736842 0.92105263 0.9 0.92307692
0.87179487 0.89473684 0.94594595 0.8974359 ]
mean value: 0.9092125323704271
key: test_recall
value: [0.5 0.5 0.75 1. 0.75 1. 0.6 0.8 0.6 1. ]
mean value: 0.75
key: train_recall
value: [0.87179487 0.87179487 0.92307692 0.8974359 0.92307692 0.92307692
0.89473684 0.89473684 0.92105263 0.8974359 ]
mean value: 0.9018218623481782
key: test_roc_auc
value: [0.75 0.6875 0.6875 1. 0.8125 0.875
0.8 0.9 0.65714286 0.85714286]
mean value: 0.8026785714285715
key: train_roc_auc
value: [0.89913273 0.91383861 0.94683258 0.92665913 0.9321267 0.93947964
0.91113654 0.91838291 0.94603356 0.91973244]
mean value: 0.9253354836037566
key: test_jcc
value: [0.5 0.4 0.42857143 1. 0.6 0.66666667
0.6 0.8 0.42857143 0.66666667]
mean value: 0.6090476190476191
key: train_jcc
value: [0.77272727 0.80952381 0.87804878 0.83333333 0.8372093 0.85714286
0.79069767 0.80952381 0.875 0.81395349]
mean value: 0.8277160327855166
MCC on Blind test: 0.08
Accuracy on Blind test: 0.74
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01721478 0.01214457 0.01244617 0.01377439 0.01370144 0.01362348
0.01294899 0.01230645 0.01298475 0.0133431 ]
mean value: 0.013448810577392578
key: score_time
value: [0.01065063 0.00836754 0.00845647 0.00828004 0.00818396 0.00835776
0.00841212 0.0085628 0.00873351 0.00875807]
mean value: 0.008676290512084961
key: test_mcc
value: [0.8819171 0.5 0.37796447 0.875 1. 0.60714286
0.76376262 1. 0.64465837 0.60714286]
mean value: 0.7257588278029415
key: train_mcc
value: [0.79411765 0.85331034 0.79599234 0.76678748 0.81031543 0.82480818
0.81031543 0.82480818 0.79688349 0.85400682]
mean value: 0.8131345350406455
key: test_accuracy
value: [0.9375 0.75 0.66666667 0.93333333 1. 0.8
0.86666667 1. 0.8 0.8 ]
mean value: 0.8554166666666667
key: train_accuracy
value: [0.89705882 0.92647059 0.89781022 0.88321168 0.90510949 0.91240876
0.90510949 0.91240876 0.89781022 0.9270073 ]
mean value: 0.9064405324173466
key: test_fscore
value: [0.93333333 0.75 0.70588235 0.93333333 1. 0.8
0.85714286 1. 0.84210526 0.8 ]
mean value: 0.8621797139908595
key: train_fscore
value: [0.89705882 0.92537313 0.89705882 0.88235294 0.90510949 0.91304348
0.90510949 0.91176471 0.89393939 0.92647059]
mean value: 0.9057280866983752
key: test_precision
value: [1. 0.75 0.6 0.875 1. 0.75
1. 1. 0.72727273 0.85714286]
mean value: 0.8559415584415584
key: train_precision
value: [0.89705882 0.93939394 0.91044776 0.89552239 0.91176471 0.91304348
0.89855072 0.91176471 0.921875 0.92647059]
mean value: 0.9125892115075633
key: test_recall
value: [0.875 0.75 0.85714286 1. 1. 0.85714286
0.75 1. 1. 0.75 ]
mean value: 0.8839285714285714
key: train_recall
value: [0.89705882 0.91176471 0.88405797 0.86956522 0.89855072 0.91304348
0.91176471 0.91176471 0.86764706 0.92647059]
mean value: 0.8991687979539642
key: test_roc_auc
value: [0.9375 0.75 0.67857143 0.9375 1. 0.80357143
0.875 1. 0.78571429 0.80357143]
mean value: 0.8571428571428571
key: train_roc_auc
value: [0.89705882 0.92647059 0.89791134 0.88331202 0.90515772 0.91240409
0.90515772 0.91240409 0.89759165 0.92700341]
mean value: 0.9064471440750212
key: test_jcc
value: [0.875 0.6 0.54545455 0.875 1. 0.66666667
0.75 1. 0.72727273 0.66666667]
mean value: 0.7706060606060606
key: train_jcc
value: [0.81333333 0.86111111 0.81333333 0.78947368 0.82666667 0.84
0.82666667 0.83783784 0.80821918 0.8630137 ]
mean value: 0.8279655509871804
MCC on Blind test: 0.11
Accuracy on Blind test: 0.65
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.39265418 0.37149286 0.37123227 0.37943006 0.37867284 0.38540936
0.36591649 0.36693406 0.37429166 0.37283516]
mean value: 0.3758868932723999
key: score_time
value: [0.00942802 0.0091536 0.0086658 0.00927925 0.00937963 0.00886846
0.00882983 0.00927758 0.009166 0.00868988]
mean value: 0.009073805809020997
key: test_mcc
value: [0.8819171 0.62994079 0.49099025 0.76376262 0.73214286 0.60714286
0.6000992 1. 0.64465837 0.33928571]
mean value: 0.6689939758834577
key: train_mcc
value: [0.91215932 0.92657079 0.94201665 0.8978896 0.97080136 0.97122151
0.94201665 0.88320546 1. 0.95630861]
mean value: 0.9402189943658086
key: test_accuracy
value: [0.9375 0.8125 0.73333333 0.86666667 0.86666667 0.8
0.8 1. 0.8 0.66666667]
mean value: 0.8283333333333334
key: train_accuracy
value: [0.95588235 0.96323529 0.97080292 0.94890511 0.98540146 0.98540146
0.97080292 0.94160584 1. 0.97810219]
mean value: 0.9700139544869043
key: test_fscore
value: [0.93333333 0.82352941 0.75 0.875 0.85714286 0.8
0.82352941 1. 0.84210526 0.66666667]
mean value: 0.8371306943830163
key: train_fscore
value: [0.95652174 0.96350365 0.97058824 0.94964029 0.98550725 0.98529412
0.97101449 0.94117647 1. 0.97810219]
mean value: 0.9701348428976124
key: test_precision
value: [1. 0.77777778 0.66666667 0.77777778 0.85714286 0.75
0.77777778 1. 0.72727273 0.71428571]
mean value: 0.8048701298701298
key: train_precision
value: [0.94285714 0.95652174 0.98507463 0.94285714 0.98550725 1.
0.95714286 0.94117647 1. 0.97101449]
mean value: 0.968215171857192
key: test_recall
value: [0.875 0.875 0.85714286 1. 0.85714286 0.85714286
0.875 1. 1. 0.625 ]
mean value: 0.8821428571428571
key: train_recall
value: [0.97058824 0.97058824 0.95652174 0.95652174 0.98550725 0.97101449
0.98529412 0.94117647 1. 0.98529412]
mean value: 0.9722506393861893
key: test_roc_auc
value: [0.9375 0.8125 0.74107143 0.875 0.86607143 0.80357143
0.79464286 1. 0.78571429 0.66964286]
mean value: 0.8285714285714286
key: train_roc_auc
value: [0.95588235 0.96323529 0.97090793 0.9488491 0.98540068 0.98550725
0.97090793 0.94160273 1. 0.97815431]
mean value: 0.9700447570332481
key: test_jcc
value: [0.875 0.7 0.6 0.77777778 0.75 0.66666667
0.7 1. 0.72727273 0.5 ]
mean value: 0.7296717171717172
key: train_jcc
value: [0.91666667 0.92957746 0.94285714 0.90410959 0.97142857 0.97101449
0.94366197 0.88888889 1. 0.95714286]
mean value: 0.9425347645398564
MCC on Blind test: 0.07
Accuracy on Blind test: 0.72
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00969172 0.00904489 0.00695467 0.00679207 0.00658846 0.00662065
0.00658751 0.00682616 0.00665498 0.00697279]
mean value: 0.007273387908935547
key: score_time
value: [0.01047778 0.01015592 0.00812674 0.0078907 0.00783539 0.00783157
0.00781465 0.00791001 0.0078311 0.00797391]
mean value: 0.008384776115417481
key: test_mcc
value: [0.8819171 0.5 0.33928571 0.56407607 0.49099025 0.60714286
0.46428571 0.73214286 0.64465837 0.07142857]
mean value: 0.5295927517042964
key: train_mcc
value: [0.61098829 0.74337629 0.6462903 0.59999905 0.55137884 0.71313464
0.65613085 0.71021843 0.63063055 0.63867147]
mean value: 0.6500818694571209
key: test_accuracy
value: [0.9375 0.75 0.66666667 0.73333333 0.73333333 0.8
0.73333333 0.86666667 0.8 0.53333333]
mean value: 0.7554166666666666
key: train_accuracy
value: [0.80147059 0.86764706 0.81751825 0.79562044 0.76642336 0.8540146
0.81751825 0.84671533 0.81021898 0.81021898]
mean value: 0.8187365822241305
key: test_fscore
value: [0.94117647 0.75 0.66666667 0.77777778 0.75 0.8
0.75 0.875 0.84210526 0.53333333]
mean value: 0.7686059511523908
key: train_fscore
value: [0.81632653 0.87671233 0.83443709 0.81333333 0.79487179 0.84615385
0.83660131 0.82644628 0.82432432 0.82894737]
mean value: 0.8298154200757712
key: test_precision
value: [0.88888889 0.75 0.625 0.63636364 0.66666667 0.75
0.75 0.875 0.72727273 0.57142857]
mean value: 0.7240620490620491
key: train_precision
value: [0.75949367 0.82051282 0.76829268 0.75308642 0.71264368 0.90163934
0.75294118 0.94339623 0.7625 0.75 ]
mean value: 0.7924506019387709
key: test_recall
value: [1. 0.75 0.71428571 1. 0.85714286 0.85714286
0.75 0.875 1. 0.5 ]
mean value: 0.8303571428571428
key: train_recall
value: [0.88235294 0.94117647 0.91304348 0.88405797 0.89855072 0.79710145
0.94117647 0.73529412 0.89705882 0.92647059]
mean value: 0.8816283034953112
key: test_roc_auc
value: [0.9375 0.75 0.66964286 0.75 0.74107143 0.80357143
0.73214286 0.86607143 0.78571429 0.53571429]
mean value: 0.7571428571428571
key: train_roc_auc
value: [0.80147059 0.86764706 0.81681586 0.79497016 0.76545183 0.85443308
0.81841432 0.84590793 0.81084825 0.81106138]
mean value: 0.8187020460358057
key: test_jcc
value: [0.88888889 0.6 0.5 0.63636364 0.6 0.66666667
0.6 0.77777778 0.72727273 0.36363636]
mean value: 0.636060606060606
key: train_jcc
value: [0.68965517 0.7804878 0.71590909 0.68539326 0.65957447 0.73333333
0.71910112 0.70422535 0.70114943 0.70786517]
mean value: 0.7096694197581203
MCC on Blind test: 0.03
Accuracy on Blind test: 0.49
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00759959 0.00745654 0.0069356 0.00687218 0.00700045 0.00688291
0.0068655 0.00682497 0.00688171 0.00696039]
mean value: 0.007027983665466309
key: score_time
value: [0.00797582 0.00790715 0.00785089 0.0078764 0.00810456 0.00792956
0.0079031 0.00789905 0.00793099 0.00804949]
mean value: 0.007942700386047363
key: test_mcc
value: [0.37796447 0.25819889 0.07142857 0.49099025 0.47245559 0.13363062
0.46428571 0.73214286 0.33928571 0.32732684]
mean value: 0.36677095205019633
key: train_mcc
value: [0.5008673 0.53311399 0.52059257 0.45151662 0.49006025 0.5360985
0.52559229 0.51215762 0.49197671 0.53517487]
mean value: 0.5097150730382196
key: test_accuracy
value: [0.6875 0.625 0.53333333 0.73333333 0.73333333 0.53333333
0.73333333 0.86666667 0.66666667 0.66666667]
mean value: 0.6779166666666666
key: train_accuracy
value: [0.75 0.76470588 0.75912409 0.72262774 0.74452555 0.76642336
0.75912409 0.75182482 0.74452555 0.76642336]
mean value: 0.7529304422498926
key: test_fscore
value: [0.70588235 0.57142857 0.53333333 0.75 0.66666667 0.63157895
0.75 0.875 0.66666667 0.70588235]
mean value: 0.6856438891346012
key: train_fscore
value: [0.75714286 0.77777778 0.77241379 0.74666667 0.75524476 0.78082192
0.7755102 0.77027027 0.75524476 0.77464789]
mean value: 0.7665740884664326
key: test_precision
value: [0.66666667 0.66666667 0.5 0.66666667 0.8 0.5
0.75 0.875 0.71428571 0.66666667]
mean value: 0.680595238095238
key: train_precision
value: [0.73611111 0.73684211 0.73684211 0.69135802 0.72972973 0.74025974
0.72151899 0.7125 0.72 0.74324324]
mean value: 0.726840504690327
key: test_recall
value: [0.75 0.5 0.57142857 0.85714286 0.57142857 0.85714286
0.75 0.875 0.625 0.75 ]
mean value: 0.7107142857142857
key: train_recall
value: [0.77941176 0.82352941 0.8115942 0.8115942 0.7826087 0.82608696
0.83823529 0.83823529 0.79411765 0.80882353]
mean value: 0.8114236999147485
key: test_roc_auc
value: [0.6875 0.625 0.53571429 0.74107143 0.72321429 0.55357143
0.73214286 0.86607143 0.66964286 0.66071429]
mean value: 0.6794642857142857
key: train_roc_auc
value: [0.75 0.76470588 0.75873828 0.72197357 0.74424552 0.76598465
0.75969736 0.75245098 0.74488491 0.76673061]
mean value: 0.7529411764705882
key: test_jcc
value: [0.54545455 0.4 0.36363636 0.6 0.5 0.46153846
0.6 0.77777778 0.5 0.54545455]
mean value: 0.5293861693861693
key: train_jcc
value: [0.6091954 0.63636364 0.62921348 0.59574468 0.60674157 0.64044944
0.63333333 0.62637363 0.60674157 0.63218391]
mean value: 0.6216340654682218
MCC on Blind test: 0.1
Accuracy on Blind test: 0.6
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00746417 0.00665522 0.00723457 0.00721335 0.00727034 0.00734687
0.00666738 0.00743985 0.00722885 0.00735736]
mean value: 0.007187795639038086
key: score_time
value: [0.009269 0.00891018 0.00945759 0.00945568 0.00962353 0.01024604
0.00979686 0.00947714 0.00947309 0.00945425]
mean value: 0.009516334533691407
key: test_mcc
value: [0.51639778 0.25819889 0.33928571 0.66143783 0.76376262 0.60714286
0.37796447 0.75592895 0.64465837 0.47245559]
mean value: 0.5397233065771696
key: train_mcc
value: [0.63242133 0.69486799 0.73721228 0.640228 0.69398264 0.64981886
0.69976319 0.63512361 0.69352089 0.63574336]
mean value: 0.6712682142948946
key: test_accuracy
value: [0.75 0.625 0.66666667 0.8 0.86666667 0.8
0.66666667 0.86666667 0.8 0.73333333]
mean value: 0.7575000000000001
key: train_accuracy
value: [0.81617647 0.84558824 0.86861314 0.81751825 0.84671533 0.82481752
0.84671533 0.81751825 0.84671533 0.81751825]
mean value: 0.8347896092743666
key: test_fscore
value: [0.77777778 0.57142857 0.66666667 0.82352941 0.875 0.8
0.61538462 0.88888889 0.84210526 0.77777778]
mean value: 0.7638558972846898
key: train_fscore
value: [0.81751825 0.85314685 0.86956522 0.82993197 0.85106383 0.82857143
0.85517241 0.81751825 0.84671533 0.82014388]
mean value: 0.8389347425188644
key: test_precision
value: [0.7 0.66666667 0.625 0.7 0.77777778 0.75
0.8 0.8 0.72727273 0.7 ]
mean value: 0.7246717171717172
key: train_precision
value: [0.8115942 0.81333333 0.86956522 0.78205128 0.83333333 0.81690141
0.80519481 0.8115942 0.84057971 0.8028169 ]
mean value: 0.8186964397105242
key: test_recall
value: [0.875 0.5 0.71428571 1. 1. 0.85714286
0.5 1. 1. 0.875 ]
mean value: 0.8321428571428572
key: train_recall
value: [0.82352941 0.89705882 0.86956522 0.88405797 0.86956522 0.84057971
0.91176471 0.82352941 0.85294118 0.83823529]
mean value: 0.8610826939471441
key: test_roc_auc
value: [0.75 0.625 0.66964286 0.8125 0.875 0.80357143
0.67857143 0.85714286 0.78571429 0.72321429]
mean value: 0.7580357142857143
key: train_roc_auc
value: [0.81617647 0.84558824 0.86860614 0.81702899 0.84654731 0.82470162
0.8471867 0.81756181 0.84676044 0.81766837]
mean value: 0.8347826086956521
key: test_jcc
value: [0.63636364 0.4 0.5 0.7 0.77777778 0.66666667
0.44444444 0.8 0.72727273 0.63636364]
mean value: 0.6288888888888888
key: train_jcc
value: [0.69135802 0.74390244 0.76923077 0.70930233 0.74074074 0.70731707
0.74698795 0.69135802 0.73417722 0.69512195]
mean value: 0.7229496515347358
MCC on Blind test: 0.05
Accuracy on Blind test: 0.61
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.0097928 0.00793552 0.0076406 0.00769567 0.00771952 0.00770187
0.00762033 0.00775814 0.00766444 0.00767016]
mean value: 0.007919907569885254
key: score_time
value: [0.00912237 0.00797772 0.00790691 0.00800848 0.00794578 0.00804377
0.00801921 0.00795007 0.00798464 0.00796342]
mean value: 0.008092236518859864
key: test_mcc
value: [0.75 0.5 0.19642857 0.76376262 0.73214286 0.73214286
0.66143783 1. 0.64465837 0.60714286]
mean value: 0.6587715957669568
key: train_mcc
value: [0.79446135 0.76470588 0.78182997 0.82480818 0.79590547 0.79590547
0.79560955 0.79560955 0.78298457 0.78107015]
mean value: 0.7912890152297882
key: test_accuracy
value: [0.875 0.75 0.6 0.86666667 0.86666667 0.86666667
0.8 1. 0.8 0.8 ]
mean value: 0.8225
key: train_accuracy
value: [0.89705882 0.88235294 0.89051095 0.91240876 0.89781022 0.89781022
0.89781022 0.89781022 0.89051095 0.89051095]
mean value: 0.8954594246457708
key: test_fscore
value: [0.875 0.75 0.57142857 0.875 0.85714286 0.85714286
0.76923077 1. 0.84210526 0.8 ]
mean value: 0.819705031810295
key: train_fscore
value: [0.89552239 0.88235294 0.88888889 0.91304348 0.9 0.9
0.89705882 0.89705882 0.88549618 0.88888889]
mean value: 0.894831041553975
key: test_precision
value: [0.875 0.75 0.57142857 0.77777778 0.85714286 0.85714286
1. 1. 0.72727273 0.85714286]
mean value: 0.8272907647907648
key: train_precision
value: [0.90909091 0.88235294 0.90909091 0.91304348 0.88732394 0.88732394
0.89705882 0.89705882 0.92063492 0.89552239]
mean value: 0.8998501080696548
key: test_recall
value: [0.875 0.75 0.57142857 1. 0.85714286 0.85714286
0.625 1. 1. 0.75 ]
mean value: 0.8285714285714285
key: train_recall
value: [0.88235294 0.88235294 0.86956522 0.91304348 0.91304348 0.91304348
0.89705882 0.89705882 0.85294118 0.88235294]
mean value: 0.8902813299232737
key: test_roc_auc
value: [0.875 0.75 0.59821429 0.875 0.86607143 0.86607143
0.8125 1. 0.78571429 0.80357143]
mean value: 0.8232142857142857
key: train_roc_auc
value: [0.89705882 0.88235294 0.89066496 0.91240409 0.89769821 0.89769821
0.89780477 0.89780477 0.8902387 0.89045183]
mean value: 0.8954177323103154
key: test_jcc
value: [0.77777778 0.6 0.4 0.77777778 0.75 0.75
0.625 1. 0.72727273 0.66666667]
mean value: 0.7074494949494949
key: train_jcc
value: [0.81081081 0.78947368 0.8 0.84 0.81818182 0.81818182
0.81333333 0.81333333 0.79452055 0.8 ]
mean value: 0.8097835345996846
MCC on Blind test: 0.11
Accuracy on Blind test: 0.65
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.61664748 0.47898817 0.45502543 0.50637126 0.53669834 0.53992033
0.47287774 0.47633958 0.47858143 0.62103176]
mean value: 0.5182481527328491
key: score_time
value: [0.01329303 0.01312971 0.01097465 0.01341534 0.01492548 0.01332402
0.01094341 0.01338291 0.01885128 0.01098609]
mean value: 0.013322591781616211
key: test_mcc
value: [0.8819171 0.51639778 0.37796447 1. 0.60714286 0.60714286
0.46428571 0.87287156 0.64465837 0.60714286]
mean value: 0.6579523574070305
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.75 0.66666667 1. 0.8 0.8
0.73333333 0.93333333 0.8 0.8 ]
mean value: 0.8220833333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.71428571 0.70588235 1. 0.8 0.8
0.75 0.94117647 0.84210526 0.8 ]
mean value: 0.8286783134306354
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.83333333 0.6 1. 0.75 0.75
0.75 0.88888889 0.72727273 0.85714286]
mean value: 0.8156637806637806
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.625 0.85714286 1. 0.85714286 0.85714286
0.75 1. 1. 0.75 ]
mean value: 0.8571428571428571
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.75 0.67857143 1. 0.80357143 0.80357143
0.73214286 0.92857143 0.78571429 0.80357143]
mean value: 0.8223214285714285
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.55555556 0.54545455 1. 0.66666667 0.66666667
0.6 0.88888889 0.72727273 0.66666667]
mean value: 0.7192171717171717
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.68
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.02302074 0.00763321 0.00718451 0.00732636 0.00716519 0.00725317
0.00720024 0.00724506 0.00735903 0.00749445]
mean value: 0.00888819694519043
key: score_time
value: [0.01008129 0.00808263 0.00788021 0.00784898 0.00779343 0.00776839
0.00773787 0.00773025 0.00829577 0.00780368]
mean value: 0.00810225009918213
key: test_mcc
value: [0.8819171 1. 1. 1. 0.6000992 0.73214286
0.87287156 0.75592895 0.73214286 0.56407607]
mean value: 0.8139178597903081
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 1. 1. 1. 0.8 0.86666667
0.93333333 0.86666667 0.86666667 0.73333333]
mean value: 0.9004166666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 1. 1. 1. 0.76923077 0.85714286
0.94117647 0.88888889 0.875 0.66666667]
mean value: 0.8939282123105652
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 1. 1. 1. 0.83333333 0.85714286
0.88888889 0.8 0.875 1. ]
mean value: 0.9143253968253968
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.71428571 0.85714286
1. 1. 0.875 0.5 ]
mean value: 0.8946428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 1. 1. 1. 0.79464286 0.86607143
0.92857143 0.85714286 0.86607143 0.75 ]
mean value: 0.9
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 1. 1. 1. 0.625 0.75
0.88888889 0.8 0.77777778 0.5 ]
mean value: 0.8230555555555555
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.86
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.07873535 0.07909012 0.07848859 0.07919955 0.07896852 0.07857132
0.08103371 0.08165836 0.07918024 0.08250403]
mean value: 0.07974298000335693
key: score_time
value: [0.01622057 0.01643443 0.01677704 0.01642728 0.01640582 0.01631761
0.01749635 0.01630569 0.01675391 0.01715064]
mean value: 0.01662893295288086
key: test_mcc
value: [0.8819171 0.51639778 0.49099025 1. 0.875 0.73214286
0.76376262 1. 0.75592895 0.875 ]
mean value: 0.7891139555200787
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.75 0.73333333 1. 0.93333333 0.86666667
0.86666667 1. 0.86666667 0.93333333]
mean value: 0.88875
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.71428571 0.75 1. 0.93333333 0.85714286
0.85714286 1. 0.88888889 0.93333333]
mean value: 0.8867460317460317
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.83333333 0.66666667 1. 0.875 0.85714286
1. 1. 0.8 1. ]
mean value: 0.9032142857142857
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.625 0.85714286 1. 1. 0.85714286
0.75 1. 1. 0.875 ]
mean value: 0.8839285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.75 0.74107143 1. 0.9375 0.86607143
0.875 1. 0.85714286 0.9375 ]
mean value: 0.8901785714285715
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.55555556 0.6 1. 0.875 0.75
0.75 1. 0.8 0.875 ]
mean value: 0.8080555555555555
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.68
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00679111 0.00661206 0.00676751 0.00668883 0.00663257 0.00666237
0.00663257 0.00670314 0.00696945 0.00672388]
mean value: 0.006718349456787109
key: score_time
value: [0.00769448 0.00768995 0.00776768 0.00772476 0.00775385 0.00774527
0.00774169 0.00774693 0.00784659 0.00774169]
mean value: 0.007745289802551269
key: test_mcc
value: [0.40451992 0.40451992 0.32732684 1. 0.76376262 0.46428571
0.13363062 0.87287156 0.73214286 0.21821789]
mean value: 0.5321277929700597
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.6875 0.6875 0.66666667 1. 0.86666667 0.73333333
0.53333333 0.93333333 0.86666667 0.6 ]
mean value: 0.7575
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73684211 0.61538462 0.61538462 1. 0.875 0.71428571
0.36363636 0.94117647 0.875 0.57142857]
mean value: 0.7308138455971274
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.63636364 0.8 0.66666667 1. 0.77777778 0.71428571
0.66666667 0.88888889 0.875 0.66666667]
mean value: 0.7692316017316018
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.5 0.57142857 1. 1. 0.71428571
0.25 1. 0.875 0.5 ]
mean value: 0.7285714285714285
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.6875 0.6875 0.66071429 1. 0.875 0.73214286
0.55357143 0.92857143 0.86607143 0.60714286]
mean value: 0.7598214285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58333333 0.44444444 0.44444444 1. 0.77777778 0.55555556
0.22222222 0.88888889 0.77777778 0.4 ]
mean value: 0.6094444444444445
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.68
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [0.98909354 0.98514724 1.04687738 0.98289633 0.98306084 0.98102474
0.9808023 0.98257184 0.98120975 0.97967005]
mean value: 0.9892354011535645
key: score_time
value: [0.09175563 0.08826041 0.08760238 0.08777761 0.08745551 0.08774495
0.08790946 0.08762598 0.08742118 0.08845329]
mean value: 0.08820064067840576
key: test_mcc
value: [0.8819171 0.75 0.76376262 1. 0.875 0.73214286
0.60714286 0.87287156 0.87287156 0.76376262]
mean value: 0.8119471171513797
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.875 0.86666667 1. 0.93333333 0.86666667
0.8 0.93333333 0.93333333 0.86666667]
mean value: 0.90125
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.875 0.875 1. 0.93333333 0.85714286
0.8 0.94117647 0.94117647 0.85714286]
mean value: 0.9013305322128852
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.875 0.77777778 1. 0.875 0.85714286
0.85714286 0.88888889 0.88888889 1. ]
mean value: 0.901984126984127
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.875 1. 1. 1. 0.85714286
0.75 1. 1. 0.75 ]
mean value: 0.9107142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.875 0.875 1. 0.9375 0.86607143
0.80357143 0.92857143 0.92857143 0.875 ]
mean value: 0.9026785714285714
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.77777778 0.77777778 1. 0.875 0.75
0.66666667 0.88888889 0.88888889 0.75 ]
mean value: 0.825
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.84
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.8277626 0.8267982 0.83943486 0.95936847 0.89719224 0.9292078
0.87691498 0.90619445 0.85252666 0.84288502]
mean value: 0.8758285284042359
key: score_time
value: [0.23116565 0.20367575 0.20599627 0.15598726 0.19488597 0.1595974
0.24725604 0.22806668 0.2303443 0.21640897]
mean value: 0.20733842849731446
key: test_mcc
value: [0.8819171 0.75 0.76376262 1. 0.875 0.73214286
0.60714286 0.87287156 0.87287156 0.66143783]
mean value: 0.8017146383453971
key: train_mcc
value: [0.98540068 0.98540068 0.95630861 0.98550418 0.98550418 0.98550418
0.98550418 0.97080136 0.97080136 0.98550418]
mean value: 0.9796233587390223
key: test_accuracy
value: [0.9375 0.875 0.86666667 1. 0.93333333 0.86666667
0.8 0.93333333 0.93333333 0.8 ]
mean value: 0.8945833333333334
key: train_accuracy
value: [0.99264706 0.99264706 0.97810219 0.99270073 0.99270073 0.99270073
0.99270073 0.98540146 0.98540146 0.99270073]
mean value: 0.9897702876771146
key: test_fscore
value: [0.93333333 0.875 0.875 1. 0.93333333 0.85714286
0.8 0.94117647 0.94117647 0.76923077]
mean value: 0.8925393234216764
key: train_fscore
value: [0.99259259 0.99259259 0.97810219 0.99280576 0.99280576 0.99280576
0.99259259 0.98529412 0.98529412 0.99259259]
mean value: 0.9897478061632561
key: test_precision
value: [1. 0.875 0.77777778 1. 0.875 0.85714286
0.85714286 0.88888889 0.88888889 1. ]
mean value: 0.901984126984127
key: train_precision
value: [1. 1. 0.98529412 0.98571429 0.98571429 0.98571429
1. 0.98529412 0.98529412 1. ]
mean value: 0.9913025210084034
key: test_recall
value: [0.875 0.875 1. 1. 1. 0.85714286
0.75 1. 1. 0.625 ]
mean value: 0.8982142857142857
key: train_recall
value: [0.98529412 0.98529412 0.97101449 1. 1. 1.
0.98529412 0.98529412 0.98529412 0.98529412]
mean value: 0.9882779198635976
key: test_roc_auc
value: [0.9375 0.875 0.875 1. 0.9375 0.86607143
0.80357143 0.92857143 0.92857143 0.8125 ]
mean value: 0.8964285714285715
key: train_roc_auc
value: [0.99264706 0.99264706 0.97815431 0.99264706 0.99264706 0.99264706
0.99264706 0.98540068 0.98540068 0.99264706]
mean value: 0.9897485080988918
key: test_jcc
value: [0.875 0.77777778 0.77777778 1. 0.875 0.75
0.66666667 0.88888889 0.88888889 0.625 ]
mean value: 0.8125
key: train_jcc
value: [0.98529412 0.98529412 0.95714286 0.98571429 0.98571429 0.98571429
0.98529412 0.97101449 0.97101449 0.98529412]
mean value: 0.9797491170381196
MCC on Blind test: 0.11
Accuracy on Blind test: 0.83
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01690888 0.00677323 0.00677204 0.00677943 0.0067265 0.00685477
0.00684571 0.00681353 0.00681353 0.00683665]
mean value: 0.00781242847442627
key: score_time
value: [0.01041579 0.00778389 0.00794005 0.00778031 0.00778699 0.00781918
0.00782156 0.00780797 0.00778556 0.0078373 ]
mean value: 0.008077859878540039
key: test_mcc
value: [0.37796447 0.25819889 0.07142857 0.49099025 0.47245559 0.13363062
0.46428571 0.73214286 0.33928571 0.32732684]
mean value: 0.36677095205019633
key: train_mcc
value: [0.5008673 0.53311399 0.52059257 0.45151662 0.49006025 0.5360985
0.52559229 0.51215762 0.49197671 0.53517487]
mean value: 0.5097150730382196
key: test_accuracy
value: [0.6875 0.625 0.53333333 0.73333333 0.73333333 0.53333333
0.73333333 0.86666667 0.66666667 0.66666667]
mean value: 0.6779166666666666
key: train_accuracy
value: [0.75 0.76470588 0.75912409 0.72262774 0.74452555 0.76642336
0.75912409 0.75182482 0.74452555 0.76642336]
mean value: 0.7529304422498926
key: test_fscore
value: [0.70588235 0.57142857 0.53333333 0.75 0.66666667 0.63157895
0.75 0.875 0.66666667 0.70588235]
mean value: 0.6856438891346012
key: train_fscore
value: [0.75714286 0.77777778 0.77241379 0.74666667 0.75524476 0.78082192
0.7755102 0.77027027 0.75524476 0.77464789]
mean value: 0.7665740884664326
key: test_precision
value: [0.66666667 0.66666667 0.5 0.66666667 0.8 0.5
0.75 0.875 0.71428571 0.66666667]
mean value: 0.680595238095238
key: train_precision
value: [0.73611111 0.73684211 0.73684211 0.69135802 0.72972973 0.74025974
0.72151899 0.7125 0.72 0.74324324]
mean value: 0.726840504690327
key: test_recall
value: [0.75 0.5 0.57142857 0.85714286 0.57142857 0.85714286
0.75 0.875 0.625 0.75 ]
mean value: 0.7107142857142857
key: train_recall
value: [0.77941176 0.82352941 0.8115942 0.8115942 0.7826087 0.82608696
0.83823529 0.83823529 0.79411765 0.80882353]
mean value: 0.8114236999147485
key: test_roc_auc
value: [0.6875 0.625 0.53571429 0.74107143 0.72321429 0.55357143
0.73214286 0.86607143 0.66964286 0.66071429]
mean value: 0.6794642857142857
key: train_roc_auc
value: [0.75 0.76470588 0.75873828 0.72197357 0.74424552 0.76598465
0.75969736 0.75245098 0.74488491 0.76673061]
mean value: 0.7529411764705882
key: test_jcc
value: [0.54545455 0.4 0.36363636 0.6 0.5 0.46153846
0.6 0.77777778 0.5 0.54545455]
mean value: 0.5293861693861693
key: train_jcc
value: [0.6091954 0.63636364 0.62921348 0.59574468 0.60674157 0.64044944
0.63333333 0.62637363 0.60674157 0.63218391]
mean value: 0.6216340654682218
MCC on Blind test: 0.1
Accuracy on Blind test: 0.6
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.09977555 0.03077435 0.03092337 0.03185725 0.03266478 0.20152545
0.03012586 0.03033113 0.03158212 0.03259635]
mean value: 0.05521562099456787
key: score_time
value: [0.01020741 0.00965858 0.00987267 0.0099175 0.01043272 0.01017642
0.00950527 0.0099225 0.00961161 0.00984406]
mean value: 0.009914875030517578
key: test_mcc
value: [1. 0.75 1. 1. 0.73214286 1.
0.87287156 1. 0.87287156 0.76376262]
mean value: 0.8991648594856769
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.875 1. 1. 0.86666667 1.
0.93333333 1. 0.93333333 0.86666667]
mean value: 0.9475
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.875 1. 1. 0.85714286 1.
0.94117647 1. 0.94117647 0.85714286]
mean value: 0.9471638655462185
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.875 1. 1. 0.85714286 1.
0.88888889 1. 0.88888889 1. ]
mean value: 0.9509920634920634
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 1. 1. 0.85714286 1.
1. 1. 1. 0.75 ]
mean value: 0.9482142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.875 1. 1. 0.86607143 1.
0.92857143 1. 0.92857143 0.875 ]
mean value: 0.9473214285714285
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.77777778 1. 1. 0.75 1.
0.88888889 1. 0.88888889 0.75 ]
mean value: 0.9055555555555556
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.84
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.00941396 0.01151013 0.01147294 0.01190257 0.0120573 0.01343966
0.01201916 0.01195407 0.01195812 0.01198363]
mean value: 0.011771154403686524
key: score_time
value: [0.01016879 0.00986719 0.01031709 0.01051497 0.01036811 0.0106349
0.01084495 0.01056862 0.01060319 0.01060867]
mean value: 0.010449647903442383
key: test_mcc
value: [1. 0.62994079 0.49099025 1. 0.875 0.73214286
0.87287156 1. 0.75592895 0.75592895]
mean value: 0.811280335150343
key: train_mcc
value: [0.91215932 0.95681396 0.92944673 0.88466669 0.89863497 0.94199209
0.90025835 0.9139999 0.91281179 0.87099729]
mean value: 0.9121781087453906
key: test_accuracy
value: [1. 0.8125 0.73333333 1. 0.93333333 0.86666667
0.93333333 1. 0.86666667 0.86666667]
mean value: 0.90125
key: train_accuracy
value: [0.95588235 0.97794118 0.96350365 0.94160584 0.94890511 0.97080292
0.94890511 0.95620438 0.95620438 0.93430657]
mean value: 0.9554261485616145
key: test_fscore
value: [1. 0.82352941 0.75 1. 0.93333333 0.85714286
0.94117647 1. 0.88888889 0.88888889]
mean value: 0.908295985060691
key: train_fscore
value: [0.95652174 0.97841727 0.96503497 0.94366197 0.95035461 0.97142857
0.95035461 0.95714286 0.95652174 0.93617021]
mean value: 0.9565608542509413
key: test_precision
value: [1. 0.77777778 0.66666667 1. 0.875 0.85714286
0.88888889 1. 0.8 0.8 ]
mean value: 0.866547619047619
key: train_precision
value: [0.94285714 0.95774648 0.93243243 0.91780822 0.93055556 0.95774648
0.91780822 0.93055556 0.94285714 0.90410959]
mean value: 0.9334476814401569
key: test_recall
value: [1. 0.875 0.85714286 1. 1. 0.85714286
1. 1. 1. 1. ]
mean value: 0.9589285714285715
key: train_recall
value: [0.97058824 1. 1. 0.97101449 0.97101449 0.98550725
0.98529412 0.98529412 0.97058824 0.97058824]
mean value: 0.9809889173060529
key: test_roc_auc
value: [1. 0.8125 0.74107143 1. 0.9375 0.86607143
0.92857143 1. 0.85714286 0.85714286]
mean value: 0.9
key: train_roc_auc
value: [0.95588235 0.97794118 0.96323529 0.9413896 0.94874254 0.9706948
0.9491688 0.95641517 0.95630861 0.93456948]
mean value: 0.9554347826086956
key: test_jcc
value: [1. 0.7 0.6 1. 0.875 0.75
0.88888889 1. 0.8 0.8 ]
mean value: 0.8413888888888889
key: train_jcc
value: [0.91666667 0.95774648 0.93243243 0.89333333 0.90540541 0.94444444
0.90540541 0.91780822 0.91666667 0.88 ]
mean value: 0.9169909052405676
MCC on Blind test: 0.06
Accuracy on Blind test: 0.65
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02651024 0.0071528 0.00675678 0.00662589 0.00688267 0.00663829
0.00680137 0.00680256 0.00683355 0.00683928]
mean value: 0.008784341812133788
key: score_time
value: [0.01571369 0.00825262 0.00792074 0.0078783 0.00784111 0.00788522
0.00770473 0.00788617 0.00793123 0.00775051]
mean value: 0.008676433563232422
key: test_mcc
value: [0.62994079 0.37796447 0.21821789 0.60714286 0.73214286 0.26189246
0.66143783 0.87287156 0.46428571 0.46428571]
mean value: 0.529018214646944
key: train_mcc
value: [0.55979287 0.57408838 0.62076318 0.57703846 0.54864511 0.60584099
0.57730871 0.51887407 0.56235346 0.56235346]
mean value: 0.5707058671664582
key: test_accuracy
value: [0.8125 0.6875 0.6 0.8 0.86666667 0.6
0.8 0.93333333 0.73333333 0.73333333]
mean value: 0.7566666666666667
key: train_accuracy
value: [0.77941176 0.78676471 0.81021898 0.78832117 0.77372263 0.80291971
0.78832117 0.75912409 0.7810219 0.7810219 ]
mean value: 0.785084800343495
key: test_fscore
value: [0.8 0.66666667 0.625 0.8 0.85714286 0.66666667
0.76923077 0.94117647 0.75 0.75 ]
mean value: 0.7625883430295195
key: train_fscore
value: [0.78571429 0.79136691 0.80882353 0.79432624 0.78321678 0.8057554
0.79136691 0.76258993 0.7826087 0.7826087 ]
mean value: 0.7888377367472581
key: test_precision
value: [0.85714286 0.71428571 0.55555556 0.75 0.85714286 0.54545455
1. 0.88888889 0.75 0.75 ]
mean value: 0.7668470418470419
key: train_precision
value: [0.76388889 0.77464789 0.82089552 0.77777778 0.75675676 0.8
0.77464789 0.74647887 0.77142857 0.77142857]
mean value: 0.775795073655595
key: test_recall
value: [0.75 0.625 0.71428571 0.85714286 0.85714286 0.85714286
0.625 1. 0.75 0.75 ]
mean value: 0.7785714285714286
key: train_recall
value: [0.80882353 0.80882353 0.79710145 0.8115942 0.8115942 0.8115942
0.80882353 0.77941176 0.79411765 0.79411765]
mean value: 0.8026001705029838
key: test_roc_auc
value: [0.8125 0.6875 0.60714286 0.80357143 0.86607143 0.61607143
0.8125 0.92857143 0.73214286 0.73214286]
mean value: 0.7598214285714285
key: train_roc_auc
value: [0.77941176 0.78676471 0.81031543 0.78815004 0.77344416 0.80285592
0.78846974 0.7592711 0.78111679 0.78111679]
mean value: 0.7850916453537937
key: test_jcc
value: [0.66666667 0.5 0.45454545 0.66666667 0.75 0.5
0.625 0.88888889 0.6 0.6 ]
mean value: 0.6251767676767677
key: train_jcc
value: [0.64705882 0.6547619 0.67901235 0.65882353 0.64367816 0.6746988
0.6547619 0.61627907 0.64285714 0.64285714]
mean value: 0.651478881972599
MCC on Blind test: 0.11
Accuracy on Blind test: 0.62
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00804782 0.00781918 0.00786948 0.00782323 0.00760174 0.00795841
0.00802374 0.00730324 0.00732517 0.00728822]
mean value: 0.0077060222625732425
key: score_time
value: [0.00777936 0.00796533 0.00840831 0.00785279 0.00842953 0.00844431
0.00777602 0.00784135 0.00779104 0.00782919]
mean value: 0.008011722564697265
key: test_mcc
value: [0.8819171 0.62994079 0.49099025 1. 0.73214286 0.60714286
0.6000992 1. 0.64465837 0.6000992 ]
mean value: 0.7186990626871869
key: train_mcc
value: [0.89949371 0.91215932 0.92791659 0.88466669 0.94199209 0.94160273
0.88938138 0.8687127 0.84688958 0.86000692]
mean value: 0.8972821710057162
key: test_accuracy
value: [0.9375 0.8125 0.73333333 1. 0.86666667 0.8
0.8 1. 0.8 0.8 ]
mean value: 0.855
key: train_accuracy
value: [0.94852941 0.95588235 0.96350365 0.94160584 0.97080292 0.97080292
0.94160584 0.93430657 0.91970803 0.9270073 ]
mean value: 0.9473754830399312
key: test_fscore
value: [0.93333333 0.82352941 0.75 1. 0.85714286 0.8
0.82352941 1. 0.84210526 0.82352941]
mean value: 0.8653169688928203
key: train_fscore
value: [0.94656489 0.95652174 0.96296296 0.94366197 0.97142857 0.97101449
0.94444444 0.93430657 0.92413793 0.93055556]
mean value: 0.9485599123980311
key: test_precision
value: [1. 0.77777778 0.66666667 1. 0.85714286 0.75
0.77777778 1. 0.72727273 0.77777778]
mean value: 0.8334415584415584
key: train_precision
value: [0.98412698 0.94285714 0.98484848 0.91780822 0.95774648 0.97101449
0.89473684 0.92753623 0.87012987 0.88157895]
mean value: 0.9332383694125169
key: test_recall
value: [0.875 0.875 0.85714286 1. 0.85714286 0.85714286
0.875 1. 1. 0.875 ]
mean value: 0.9071428571428571
key: train_recall
value: [0.91176471 0.97058824 0.94202899 0.97101449 0.98550725 0.97101449
1. 0.94117647 0.98529412 0.98529412]
mean value: 0.9663682864450128
key: test_roc_auc
value: [0.9375 0.8125 0.74107143 1. 0.86607143 0.80357143
0.79464286 1. 0.78571429 0.79464286]
mean value: 0.8535714285714285
key: train_roc_auc
value: [0.94852941 0.95588235 0.96366155 0.9413896 0.9706948 0.97080136
0.94202899 0.93435635 0.92018329 0.92742967]
mean value: 0.9474957374254049
key: test_jcc
value: [0.875 0.7 0.6 1. 0.75 0.66666667
0.7 1. 0.72727273 0.7 ]
mean value: 0.7718939393939394
key: train_jcc
value: [0.89855072 0.91666667 0.92857143 0.89333333 0.94444444 0.94366197
0.89473684 0.87671233 0.85897436 0.87012987]
mean value: 0.9025781969461155
MCC on Blind test: 0.07
Accuracy on Blind test: 0.69
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00999784 0.0094552 0.00723362 0.00716352 0.00696826 0.00690985
0.00689554 0.00771546 0.0079 0.00781608]
mean value: 0.007805538177490234
key: score_time
value: [0.01038742 0.00955176 0.00792933 0.00781178 0.00785041 0.00781822
0.00781894 0.00777292 0.00842071 0.00786495]
mean value: 0.008322644233703613
key: test_mcc
value: [0.8819171 0.62994079 0.49099025 0.875 0.76376262 0.60714286
0.46428571 0.53452248 0.46428571 0.47245559]
mean value: 0.6184303121694533
key: train_mcc
value: [0.88580789 0.81600218 0.92791659 0.9001543 0.80787444 0.80014442
0.8437116 0.64876322 0.87609014 0.86339318]
mean value: 0.836985797579123
key: test_accuracy
value: [0.9375 0.8125 0.73333333 0.93333333 0.86666667 0.8
0.73333333 0.73333333 0.73333333 0.73333333]
mean value: 0.8016666666666666
key: train_accuracy
value: [0.94117647 0.90441176 0.96350365 0.94890511 0.89781022 0.89051095
0.91970803 0.79562044 0.93430657 0.9270073 ]
mean value: 0.912296049806784
key: test_fscore
value: [0.93333333 0.82352941 0.75 0.93333333 0.875 0.8
0.75 0.8 0.75 0.77777778]
mean value: 0.819297385620915
key: train_fscore
value: [0.93846154 0.91034483 0.96296296 0.95104895 0.90666667 0.90196078
0.91472868 0.82926829 0.92913386 0.93150685]
mean value: 0.9176083413476306
key: test_precision
value: [1. 0.77777778 0.66666667 0.875 0.77777778 0.75
0.75 0.66666667 0.75 0.7 ]
mean value: 0.7713888888888889
key: train_precision
value: [0.98387097 0.85714286 0.98484848 0.91891892 0.83950617 0.82142857
0.96721311 0.70833333 1. 0.87179487]
mean value: 0.8953057292802578
key: test_recall
value: [0.875 0.875 0.85714286 1. 1. 0.85714286
0.75 1. 0.75 0.875 ]
mean value: 0.8839285714285714
key: train_recall
value: [0.89705882 0.97058824 0.94202899 0.98550725 0.98550725 1.
0.86764706 1. 0.86764706 1. ]
mean value: 0.9515984654731457
key: test_roc_auc
value: [0.9375 0.8125 0.74107143 0.9375 0.875 0.80357143
0.73214286 0.71428571 0.73214286 0.72321429]
mean value: 0.8008928571428571
key: train_roc_auc
value: [0.94117647 0.90441176 0.96366155 0.94863598 0.89716539 0.88970588
0.91933078 0.79710145 0.93382353 0.92753623]
mean value: 0.9122549019607843
key: test_jcc
value: [0.875 0.7 0.6 0.875 0.77777778 0.66666667
0.6 0.66666667 0.6 0.63636364]
mean value: 0.6997474747474748
key: train_jcc
value: [0.88405797 0.83544304 0.92857143 0.90666667 0.82926829 0.82142857
0.84285714 0.70833333 0.86764706 0.87179487]
mean value: 0.8496068375147647
MCC on Blind test: 0.06
Accuracy on Blind test: 0.66
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.07770419 0.06228852 0.0625062 0.06266785 0.06289601 0.06246185
0.06297612 0.06292748 0.06235862 0.06280899]
mean value: 0.06415958404541015
key: score_time
value: [0.01418233 0.01393175 0.01422071 0.01399136 0.01391673 0.01394653
0.01503801 0.01420355 0.01413107 0.01432395]
mean value: 0.014188599586486817
key: test_mcc
value: [0.8819171 0.75 0.875 0.875 0.73214286 0.87287156
0.87287156 1. 0.75592895 0.76376262]
mean value: 0.8379494644563421
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.875 0.93333333 0.93333333 0.86666667 0.93333333
0.93333333 1. 0.86666667 0.86666667]
mean value: 0.9145833333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.875 0.93333333 0.93333333 0.85714286 0.92307692
0.94117647 1. 0.88888889 0.85714286]
mean value: 0.9142427996839761
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.875 0.875 0.875 0.85714286 1.
0.88888889 1. 0.8 1. ]
mean value: 0.9171031746031746
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.875 1. 1. 0.85714286 0.85714286
1. 1. 1. 0.75 ]
mean value: 0.9214285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.875 0.9375 0.9375 0.86607143 0.92857143
0.92857143 1. 0.85714286 0.875 ]
mean value: 0.9142857142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.77777778 0.875 0.875 0.75 0.85714286
0.88888889 1. 0.8 0.75 ]
mean value: 0.8448809523809524
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.75
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.02706838 0.02781153 0.04628038 0.03850269 0.0461607 0.04655218
0.04729891 0.04126883 0.03603816 0.0402298 ]
mean value: 0.03972115516662598
key: score_time
value: [0.02073336 0.02294326 0.03598142 0.040658 0.03594398 0.03722
0.03625917 0.02713251 0.02583647 0.03715944]
mean value: 0.03198676109313965
key: test_mcc
value: [0.8819171 0.8819171 1. 1. 0.73214286 0.73214286
0.87287156 0.87287156 0.73214286 1. ]
mean value: 0.8706005900692904
key: train_mcc
value: [0.98540068 1. 1. 1. 1. 1.
1. 0.98550725 1. 1. ]
mean value: 0.9970907922626642
key: test_accuracy
value: [0.9375 0.9375 1. 1. 0.86666667 0.86666667
0.93333333 0.93333333 0.86666667 1. ]
mean value: 0.9341666666666667
key: train_accuracy
value: [0.99264706 1. 1. 1. 1. 1.
1. 0.99270073 1. 1. ]
mean value: 0.9985347788750537
key: test_fscore
value: [0.94117647 0.94117647 1. 1. 0.85714286 0.85714286
0.94117647 0.94117647 0.875 1. ]
mean value: 0.9353991596638656
key: train_fscore
value: [0.99259259 1. 1. 1. 1. 1.
1. 0.99270073 1. 1. ]
mean value: 0.99852933225196
key: test_precision
value: [0.88888889 0.88888889 1. 1. 0.85714286 0.85714286
0.88888889 0.88888889 0.875 1. ]
mean value: 0.914484126984127
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.98550725 1. 1. ]
mean value: 0.9985507246376811
key: test_recall
value: [1. 1. 1. 1. 0.85714286 0.85714286
1. 1. 0.875 1. ]
mean value: 0.9589285714285715
key: train_recall
value: [0.98529412 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9985294117647059
key: test_roc_auc
value: [0.9375 0.9375 1. 1. 0.86607143 0.86607143
0.92857143 0.92857143 0.86607143 1. ]
mean value: 0.9330357142857143
key: train_roc_auc
value: [0.99264706 1. 1. 1. 1. 1.
1. 0.99275362 1. 1. ]
mean value: 0.9985400682011936
key: test_jcc
value: [0.88888889 0.88888889 1. 1. 0.75 0.75
0.88888889 0.88888889 0.77777778 1. ]
mean value: 0.8833333333333333
key: train_jcc
value: [0.98529412 1. 1. 1. 1. 1.
1. 0.98550725 1. 1. ]
mean value: 0.997080136402387
MCC on Blind test: 0.12
Accuracy on Blind test: 0.85
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.03373861 0.03912592 0.04229856 0.04023004 0.04611397 0.04011154
0.03928065 0.04038763 0.04050183 0.04010868]
mean value: 0.04018974304199219
key: score_time
value: [0.0198133 0.01117086 0.01123762 0.02080536 0.02091765 0.01118398
0.02124166 0.02203465 0.01984 0.02217436]
mean value: 0.01804194450378418
key: test_mcc
value: [0.77459667 0.37796447 0.33928571 0.56407607 0.76376262 0.73214286
0.37796447 0.87287156 0.64465837 0.46428571]
mean value: 0.5911608523782237
key: train_mcc
value: [0.94117647 0.95598573 0.98550418 0.95630861 0.94160273 0.97080136
0.97080136 0.97080136 0.97080136 0.94201665]
mean value: 0.9605799824099576
key: test_accuracy
value: [0.875 0.6875 0.66666667 0.73333333 0.86666667 0.86666667
0.66666667 0.93333333 0.8 0.73333333]
mean value: 0.7829166666666667
key: train_accuracy
value: [0.97058824 0.97794118 0.99270073 0.97810219 0.97080292 0.98540146
0.98540146 0.98540146 0.98540146 0.97080292]
mean value: 0.9802544010304852
key: test_fscore
value: [0.88888889 0.66666667 0.66666667 0.77777778 0.875 0.85714286
0.61538462 0.94117647 0.84210526 0.75 ]
mean value: 0.7880809206273602
key: train_fscore
value: [0.97058824 0.97810219 0.99280576 0.97810219 0.97101449 0.98550725
0.98529412 0.98529412 0.98529412 0.97101449]
mean value: 0.980301695507708
key: test_precision
value: [0.8 0.71428571 0.625 0.63636364 0.77777778 0.85714286
0.8 0.88888889 0.72727273 0.75 ]
mean value: 0.7576731601731602
key: train_precision
value: [0.97058824 0.97101449 0.98571429 0.98529412 0.97101449 0.98550725
0.98529412 0.98529412 0.98529412 0.95714286]
mean value: 0.9782158080623554
key: test_recall
value: [1. 0.625 0.71428571 1. 1. 0.85714286
0.5 1. 1. 0.75 ]
mean value: 0.8446428571428571
key: train_recall
value: [0.97058824 0.98529412 1. 0.97101449 0.97101449 0.98550725
0.98529412 0.98529412 0.98529412 0.98529412]
mean value: 0.982459505541347
key: test_roc_auc
value: [0.875 0.6875 0.66964286 0.75 0.875 0.86607143
0.67857143 0.92857143 0.78571429 0.73214286]
mean value: 0.7848214285714286
key: train_roc_auc
value: [0.97058824 0.97794118 0.99264706 0.97815431 0.97080136 0.98540068
0.98540068 0.98540068 0.98540068 0.97090793]
mean value: 0.9802642796248935
key: test_jcc
value: [0.8 0.5 0.5 0.63636364 0.77777778 0.75
0.44444444 0.88888889 0.72727273 0.6 ]
mean value: 0.6624747474747474
key: train_jcc
value: [0.94285714 0.95714286 0.98571429 0.95714286 0.94366197 0.97142857
0.97101449 0.97101449 0.97101449 0.94366197]
mean value: 0.9614653136208555
MCC on Blind test: 0.05
Accuracy on Blind test: 0.62
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.09781337 0.10118818 0.09096408 0.09063625 0.08830929 0.08863807
0.1010282 0.0922606 0.0915482 0.09096527]
mean value: 0.09333515167236328
key: score_time
value: [0.00950933 0.00844288 0.00881338 0.00852418 0.00897932 0.00888801
0.00875974 0.00871754 0.00904465 0.00866079]
mean value: 0.008833980560302735
key: test_mcc
value: [0.8819171 0.8819171 1. 1. 0.73214286 0.73214286
0.87287156 0.87287156 0.73214286 1. ]
mean value: 0.8706005900692904
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.9375 1. 1. 0.86666667 0.86666667
0.93333333 0.93333333 0.86666667 1. ]
mean value: 0.9341666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.94117647 1. 1. 0.85714286 0.85714286
0.94117647 0.94117647 0.875 1. ]
mean value: 0.9353991596638656
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.88888889 1. 1. 0.85714286 0.85714286
0.88888889 0.88888889 0.875 1. ]
mean value: 0.914484126984127
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.85714286 0.85714286
1. 1. 0.875 1. ]
mean value: 0.9589285714285715
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.9375 1. 1. 0.86607143 0.86607143
0.92857143 0.92857143 0.86607143 1. ]
mean value: 0.9330357142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.88888889 1. 1. 0.75 0.75
0.88888889 0.88888889 0.77777778 1. ]
mean value: 0.8833333333333333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.82
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00983596 0.01095533 0.01153588 0.01127434 0.01293206 0.01323128
0.01175475 0.01181364 0.01138139 0.01201797]
mean value: 0.011673259735107421
key: score_time
value: [0.01050639 0.01042032 0.01051211 0.01094747 0.01169777 0.01332498
0.01089931 0.01096082 0.01095772 0.01388907]
mean value: 0.011411595344543456
key: test_mcc
value: [0.75 0.62994079 0.64465837 0.64465837 0.6000992 0.34247476
0.46770717 0.49099025 0.33928571 0.66143783]
mean value: 0.5571252457078674
key: train_mcc
value: [0.84051051 0.92737353 0.90259957 0.80073303 0.88938138 0.71739374
0.94318882 0.82498207 0.90246052 0.92944673]
mean value: 0.8678069912939567
key: test_accuracy
value: [0.875 0.8125 0.8 0.8 0.8 0.66666667
0.66666667 0.73333333 0.66666667 0.8 ]
mean value: 0.7620833333333333
key: train_accuracy
value: [0.91911765 0.96323529 0.94890511 0.89051095 0.94160584 0.83941606
0.97080292 0.90510949 0.94890511 0.96350365]
mean value: 0.9291112065264062
key: test_fscore
value: [0.875 0.82352941 0.72727273 0.72727273 0.76923077 0.54545455
0.54545455 0.71428571 0.66666667 0.76923077]
mean value: 0.7163397876633171
key: train_fscore
value: [0.91603053 0.96240602 0.94656489 0.87804878 0.93846154 0.81034483
0.96969697 0.89430894 0.94573643 0.96183206]
mean value: 0.9223430989384103
key: test_precision
value: [0.875 0.77777778 1. 1. 0.83333333 0.75
1. 0.83333333 0.71428571 1. ]
mean value: 0.8783730158730159
key: train_precision
value: [0.95238095 0.98461538 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9936996336996337
key: test_recall
value: [0.875 0.875 0.57142857 0.57142857 0.71428571 0.42857143
0.375 0.625 0.625 0.625 ]
mean value: 0.6285714285714286
key: train_recall
value: [0.88235294 0.94117647 0.89855072 0.7826087 0.88405797 0.68115942
0.94117647 0.80882353 0.89705882 0.92647059]
mean value: 0.8643435635123615
key: test_roc_auc
value: [0.875 0.8125 0.78571429 0.78571429 0.79464286 0.65178571
0.6875 0.74107143 0.66964286 0.8125 ]
mean value: 0.7616071428571428
key: train_roc_auc
value: [0.91911765 0.96323529 0.94927536 0.89130435 0.94202899 0.84057971
0.97058824 0.90441176 0.94852941 0.96323529]
mean value: 0.9292306052855925
key: test_jcc
value: [0.77777778 0.7 0.57142857 0.57142857 0.625 0.375
0.375 0.55555556 0.5 0.625 ]
mean value: 0.5676190476190476
key: train_jcc
value: [0.84507042 0.92753623 0.89855072 0.7826087 0.88405797 0.68115942
0.94117647 0.80882353 0.89705882 0.92647059]
mean value: 0.8592512877778178
MCC on Blind test: 0.13
Accuracy on Blind test: 0.85
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01431322 0.01028609 0.0085125 0.00834036 0.00857472 0.00834465
0.00830102 0.00752926 0.0077374 0.00805783]
mean value: 0.00899970531463623
key: score_time
value: [0.01112556 0.00929952 0.00890088 0.00855279 0.0085485 0.00859904
0.00831628 0.00797892 0.00823665 0.00807309]
mean value: 0.00876312255859375
key: test_mcc
value: [0.8819171 0.62994079 0.66143783 1. 0.875 0.73214286
0.6000992 1. 0.75592895 0.6000992 ]
mean value: 0.7736565919262326
key: train_mcc
value: [0.86849267 0.89715584 0.89791134 0.88355744 0.88355744 0.89863497
0.85440207 0.85440207 0.89791134 0.86948194]
mean value: 0.8805507116446566
key: test_accuracy
value: [0.9375 0.8125 0.8 1. 0.93333333 0.86666667
0.8 1. 0.86666667 0.8 ]
mean value: 0.8816666666666667
key: train_accuracy
value: [0.93382353 0.94852941 0.94890511 0.94160584 0.94160584 0.94890511
0.9270073 0.9270073 0.94890511 0.93430657]
mean value: 0.9400601116358952
key: test_fscore
value: [0.93333333 0.82352941 0.82352941 1. 0.93333333 0.85714286
0.82352941 1. 0.88888889 0.82352941]
mean value: 0.8906816059757237
key: train_fscore
value: [0.9352518 0.94890511 0.94890511 0.94285714 0.94285714 0.95035461
0.92753623 0.92753623 0.94890511 0.9352518 ]
mean value: 0.9408360285000935
key: test_precision
value: [1. 0.77777778 0.7 1. 0.875 0.85714286
0.77777778 1. 0.8 0.77777778]
mean value: 0.856547619047619
key: train_precision
value: [0.91549296 0.94202899 0.95588235 0.92957746 0.92957746 0.93055556
0.91428571 0.91428571 0.94202899 0.91549296]
mean value: 0.9289208153153076
key: test_recall
value: [0.875 0.875 1. 1. 1. 0.85714286
0.875 1. 1. 0.875 ]
mean value: 0.9357142857142857
key: train_recall
value: [0.95588235 0.95588235 0.94202899 0.95652174 0.95652174 0.97101449
0.94117647 0.94117647 0.95588235 0.95588235]
mean value: 0.9531969309462915
key: test_roc_auc
value: [0.9375 0.8125 0.8125 1. 0.9375 0.86607143
0.79464286 1. 0.85714286 0.79464286]
mean value: 0.88125
key: train_roc_auc
value: [0.93382353 0.94852941 0.94895567 0.94149616 0.94149616 0.94874254
0.92710997 0.92710997 0.94895567 0.93446292]
mean value: 0.940068201193521
key: test_jcc
value: [0.875 0.7 0.7 1. 0.875 0.75 0.7 1. 0.8 0.7 ]
mean value: 0.8099999999999999
key: train_jcc
value: [0.87837838 0.90277778 0.90277778 0.89189189 0.89189189 0.90540541
0.86486486 0.86486486 0.90277778 0.87837838]
mean value: 0.888400900900901
MCC on Blind test: 0.06
Accuracy on Blind test: 0.67
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:143: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:146: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.07313299 0.06227994 0.06231952 0.06033921 0.06083584 0.06107545
0.06096387 0.06110859 0.06186771 0.06140947]
mean value: 0.06253325939178467
key: score_time
value: [0.00833368 0.00824118 0.00828338 0.00820613 0.00824714 0.00827527
0.00827289 0.00825977 0.00888276 0.00831437]
mean value: 0.008331656455993652
key: test_mcc
value: [0.8819171 0.62994079 0.66143783 1. 0.875 0.73214286
0.75592895 1. 0.75592895 0.6000992 ]
mean value: 0.7892395667131802
key: train_mcc
value: [0.86849267 0.89715584 0.8978896 0.89863497 0.88355744 0.92709446
0.89869927 0.85440207 0.92710997 0.87099729]
mean value: 0.8924033569902855
key: test_accuracy
value: [0.9375 0.8125 0.8 1. 0.93333333 0.86666667
0.86666667 1. 0.86666667 0.8 ]
mean value: 0.8883333333333333
key: train_accuracy
value: [0.93382353 0.94852941 0.94890511 0.94890511 0.94160584 0.96350365
0.94890511 0.9270073 0.96350365 0.93430657]
mean value: 0.9458995276942894
key: test_fscore
value: [0.93333333 0.82352941 0.82352941 1. 0.93333333 0.85714286
0.88888889 1. 0.88888889 0.82352941]
mean value: 0.8972175536881419
key: train_fscore
value: [0.9352518 0.94890511 0.94964029 0.95035461 0.94285714 0.96402878
0.94964029 0.92753623 0.96350365 0.93617021]
mean value: 0.946788810763946
key: test_precision
value: [1. 0.77777778 0.7 1. 0.875 0.85714286
0.8 1. 0.8 0.77777778]
mean value: 0.8587698412698412
key: train_precision
value: [0.91549296 0.94202899 0.94285714 0.93055556 0.92957746 0.95714286
0.92957746 0.91428571 0.95652174 0.90410959]
mean value: 0.932214947084399
key: test_recall
value: [0.875 0.875 1. 1. 1. 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9482142857142857
key: train_recall
value: [0.95588235 0.95588235 0.95652174 0.97101449 0.95652174 0.97101449
0.97058824 0.94117647 0.97058824 0.97058824]
mean value: 0.9619778346121057
key: test_roc_auc
value: [0.9375 0.8125 0.8125 1. 0.9375 0.86607143
0.85714286 1. 0.85714286 0.79464286]
mean value: 0.8875000000000001
key: train_roc_auc
value: [0.93382353 0.94852941 0.9488491 0.94874254 0.94149616 0.96344842
0.94906223 0.92710997 0.96355499 0.93456948]
mean value: 0.9459185848252345
key: test_jcc
value: [0.875 0.7 0.7 1. 0.875 0.75 0.8 1. 0.8 0.7 ]
mean value: 0.82
key: train_jcc
value: [0.87837838 0.90277778 0.90410959 0.90540541 0.89189189 0.93055556
0.90410959 0.86486486 0.92957746 0.88 ]
mean value: 0.8991670516744799
MCC on Blind test: 0.06
Accuracy on Blind test: 0.67
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01616359 0.01377511 0.01260805 0.01199722 0.01317811 0.01211739
0.01303506 0.01296759 0.01235151 0.01292706]
mean value: 0.013112068176269531
key: score_time
value: [0.01072264 0.00871634 0.00817013 0.00809884 0.00805712 0.0079968
0.00806427 0.00803137 0.00803447 0.00811362]
mean value: 0.008400559425354004
key: test_mcc
value: [0.8819171 0.5 0.37796447 0.73214286 0.87287156 0.60714286
0.60714286 0.60714286 0.64465837 0.6000992 ]
mean value: 0.6431082135582106
key: train_mcc
value: [0.77949606 0.80961181 0.82629176 0.78182997 0.81031543 0.82480818
0.75186529 0.81092683 0.82614456 0.79560955]
mean value: 0.8016899442942331
key: test_accuracy
value: [0.9375 0.75 0.66666667 0.86666667 0.93333333 0.8
0.8 0.8 0.8 0.8 ]
mean value: 0.8154166666666667
key: train_accuracy
value: [0.88970588 0.90441176 0.91240876 0.89051095 0.90510949 0.91240876
0.87591241 0.90510949 0.91240876 0.89781022]
mean value: 0.9005796479175612
key: test_fscore
value: [0.93333333 0.75 0.70588235 0.85714286 0.92307692 0.8
0.8 0.8 0.84210526 0.82352941]
mean value: 0.823507014141689
key: train_fscore
value: [0.88888889 0.90225564 0.91044776 0.88888889 0.90510949 0.91304348
0.87407407 0.90225564 0.90909091 0.89705882]
mean value: 0.8991113591173656
key: test_precision
value: [1. 0.75 0.6 0.85714286 1. 0.75
0.85714286 0.85714286 0.72727273 0.77777778]
mean value: 0.8176479076479076
key: train_precision
value: [0.89552239 0.92307692 0.93846154 0.90909091 0.91176471 0.91304348
0.88059701 0.92307692 0.9375 0.89705882]
mean value: 0.9129192704364003
key: test_recall
value: [0.875 0.75 0.85714286 0.85714286 0.85714286 0.85714286
0.75 0.75 1. 0.875 ]
mean value: 0.8428571428571429
key: train_recall
value: [0.88235294 0.88235294 0.88405797 0.86956522 0.89855072 0.91304348
0.86764706 0.88235294 0.88235294 0.89705882]
mean value: 0.8859335038363171
key: test_roc_auc
value: [0.9375 0.75 0.67857143 0.86607143 0.92857143 0.80357143
0.80357143 0.80357143 0.78571429 0.79464286]
mean value: 0.8151785714285714
key: train_roc_auc
value: [0.88970588 0.90441176 0.91261722 0.89066496 0.90515772 0.91240409
0.87585251 0.90494459 0.91219096 0.89780477]
mean value: 0.9005754475703325
key: test_jcc
value: [0.875 0.6 0.54545455 0.75 0.85714286 0.66666667
0.66666667 0.66666667 0.72727273 0.7 ]
mean value: 0.705487012987013
key: train_jcc
value: [0.8 0.82191781 0.83561644 0.8 0.82666667 0.84
0.77631579 0.82191781 0.83333333 0.81333333]
mean value: 0.8169101177601538
MCC on Blind test: 0.12
Accuracy on Blind test: 0.66
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.37280297 0.37843227 0.38014102 0.37847543 0.37922144 0.38759017
0.37933087 0.39223146 0.38670659 0.38647294]
mean value: 0.38214051723480225
key: score_time
value: [0.0084753 0.00828695 0.00884271 0.00918055 0.00932026 0.00898337
0.00927162 0.00885415 0.00936317 0.00934863]
mean value: 0.008992671966552734
key: test_mcc
value: [1. 0.77459667 0.66143783 0.76376262 0.73214286 0.60714286
0.75592895 0.87287156 0.75592895 0.6000992 ]
mean value: 0.7523911478249176
key: train_mcc
value: [0.94158382 1. 0.95629932 0.94199209 0.95629932 0.98550418
0.95713391 1. 1. 1. ]
mean value: 0.9738812635764046
key: test_accuracy
value: [1. 0.875 0.8 0.86666667 0.86666667 0.8
0.86666667 0.93333333 0.86666667 0.8 ]
mean value: 0.8675
key: train_accuracy
value: [0.97058824 1. 0.97810219 0.97080292 0.97810219 0.99270073
0.97810219 1. 1. 1. ]
mean value: 0.986839845427222
key: test_fscore
value: [1. 0.88888889 0.82352941 0.875 0.85714286 0.8
0.88888889 0.94117647 0.88888889 0.82352941]
mean value: 0.8787044817927171
key: train_fscore
value: [0.97101449 1. 0.97841727 0.97142857 0.97841727 0.99280576
0.97841727 1. 1. 1. ]
mean value: 0.9870500618139029
key: test_precision
value: [1. 0.8 0.7 0.77777778 0.85714286 0.75
0.8 0.88888889 0.8 0.77777778]
mean value: 0.8151587301587302
key: train_precision
value: [0.95714286 1. 0.97142857 0.95774648 0.97142857 0.98571429
0.95774648 1. 1. 1. ]
mean value: 0.9801207243460764
key: test_recall
value: [1. 1. 1. 1. 0.85714286 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9589285714285715
key: train_recall
value: [0.98529412 1. 0.98550725 0.98550725 0.98550725 1.
1. 1. 1. 1. ]
mean value: 0.9941815856777494
key: test_roc_auc
value: [1. 0.875 0.8125 0.875 0.86607143 0.80357143
0.85714286 0.92857143 0.85714286 0.79464286]
mean value: 0.8669642857142857
key: train_roc_auc
value: [0.97058824 1. 0.97804774 0.9706948 0.97804774 0.99264706
0.97826087 1. 1. 1. ]
mean value: 0.9868286445012788
key: test_jcc
value: [1. 0.8 0.7 0.77777778 0.75 0.66666667
0.8 0.88888889 0.8 0.7 ]
mean value: 0.7883333333333333
key: train_jcc
value: [0.94366197 1. 0.95774648 0.94444444 0.95774648 0.98571429
0.95774648 1. 1. 1. ]
mean value: 0.9747060138609435
MCC on Blind test: 0.0
Accuracy on Blind test: 0.68
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00959182 0.00908065 0.00727654 0.0070405 0.0074923 0.00699615
0.00736642 0.00702 0.00749421 0.0074172 ]
mean value: 0.0076775789260864254
key: score_time
value: [0.01065612 0.01025677 0.00826311 0.0082767 0.00856185 0.00839043
0.00823283 0.00838685 0.00851226 0.00863767]
mean value: 0.008817458152770996
key: test_mcc
value: [0.77459667 0.37796447 0.49099025 0.37796447 0.21821789 0.49099025
0.18898224 0.46428571 0.64465837 0.20044593]
mean value: 0.42290962650028463
key: train_mcc
value: [0.57208135 0.54899485 0.52400868 0.47754676 0.56162481 0.60455208
0.60096088 0.6802431 0.57604541 0.66161034]
mean value: 0.5807668254236807
key: test_accuracy
value: [0.875 0.6875 0.73333333 0.66666667 0.6 0.73333333
0.6 0.73333333 0.8 0.6 ]
mean value: 0.7029166666666666
key: train_accuracy
value: [0.77205882 0.76470588 0.74452555 0.72992701 0.76642336 0.79562044
0.78832117 0.83211679 0.77372263 0.81751825]
mean value: 0.7784939888364105
key: test_fscore
value: [0.88888889 0.66666667 0.75 0.70588235 0.625 0.75
0.66666667 0.75 0.84210526 0.7 ]
mean value: 0.7345209838321294
key: train_fscore
value: [0.80254777 0.79220779 0.78527607 0.76433121 0.8 0.77419355
0.81290323 0.80991736 0.80254777 0.83870968]
mean value: 0.7982634424404584
key: test_precision
value: [0.8 0.71428571 0.66666667 0.6 0.55555556 0.66666667
0.6 0.75 0.72727273 0.58333333]
mean value: 0.6663780663780664
key: train_precision
value: [0.70786517 0.70930233 0.68085106 0.68181818 0.7032967 0.87272727
0.72413793 0.9245283 0.70786517 0.74712644]
mean value: 0.7459518554034876
key: test_recall
value: [1. 0.625 0.85714286 0.85714286 0.71428571 0.85714286
0.75 0.75 1. 0.875 ]
mean value: 0.8285714285714285
key: train_recall
value: [0.92647059 0.89705882 0.92753623 0.86956522 0.92753623 0.69565217
0.92647059 0.72058824 0.92647059 0.95588235]
mean value: 0.8773231031543052
key: test_roc_auc
value: [0.875 0.6875 0.74107143 0.67857143 0.60714286 0.74107143
0.58928571 0.73214286 0.78571429 0.58035714]
mean value: 0.7017857142857142
key: train_roc_auc
value: [0.77205882 0.76470588 0.74317988 0.72890026 0.7652387 0.7963555
0.78932225 0.83130861 0.7748295 0.81852089]
mean value: 0.7784420289855073
key: test_jcc
value: [0.8 0.5 0.6 0.54545455 0.45454545 0.6
0.5 0.6 0.72727273 0.53846154]
mean value: 0.5865734265734266
key: train_jcc
value: [0.67021277 0.65591398 0.64646465 0.6185567 0.66666667 0.63157895
0.68478261 0.68055556 0.67021277 0.72222222]
mean value: 0.6647166858413609
MCC on Blind test: 0.02
Accuracy on Blind test: 0.47
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00780797 0.00748897 0.00779843 0.00788808 0.007725 0.00771189
0.00735712 0.00735378 0.0075953 0.00707674]
mean value: 0.007580327987670899
key: score_time
value: [0.00868344 0.00823379 0.00862813 0.00852036 0.00870013 0.00808978
0.00818658 0.0080483 0.00845337 0.0080812 ]
mean value: 0.008362507820129395
key: test_mcc
value: [0.25 0.25819889 0.07142857 0.33928571 0.46428571 0.13363062
0.33928571 0.46428571 0.33928571 0.49099025]
mean value: 0.3150676906591499
key: train_mcc
value: [0.48788604 0.49441323 0.48933032 0.47900717 0.52059257 0.46076782
0.4312221 0.41698711 0.44522592 0.43208129]
mean value: 0.46575135687893415
key: test_accuracy
value: [0.625 0.625 0.53333333 0.66666667 0.73333333 0.53333333
0.66666667 0.73333333 0.66666667 0.73333333]
mean value: 0.6516666666666666
key: train_accuracy
value: [0.74264706 0.74264706 0.74452555 0.73722628 0.75912409 0.72992701
0.71532847 0.7080292 0.72262774 0.71532847]
mean value: 0.7317410905968227
key: test_fscore
value: [0.625 0.57142857 0.53333333 0.66666667 0.71428571 0.63157895
0.66666667 0.75 0.66666667 0.71428571]
mean value: 0.6539912280701754
key: train_fscore
value: [0.75524476 0.76510067 0.75177305 0.75675676 0.77241379 0.74125874
0.71942446 0.71428571 0.72058824 0.72340426]
mean value: 0.7420250432480666
key: test_precision
value: [0.625 0.66666667 0.5 0.625 0.71428571 0.5
0.71428571 0.75 0.71428571 0.83333333]
mean value: 0.6642857142857143
key: train_precision
value: [0.72 0.7037037 0.73611111 0.70886076 0.73684211 0.71621622
0.70422535 0.69444444 0.72058824 0.69863014]
mean value: 0.7139622064625399
key: test_recall
value: [0.625 0.5 0.57142857 0.71428571 0.71428571 0.85714286
0.625 0.75 0.625 0.625 ]
mean value: 0.6607142857142857
key: train_recall
value: [0.79411765 0.83823529 0.76811594 0.8115942 0.8115942 0.76811594
0.73529412 0.73529412 0.72058824 0.75 ]
mean value: 0.7732949701619778
key: test_roc_auc
value: [0.625 0.625 0.53571429 0.66964286 0.73214286 0.55357143
0.66964286 0.73214286 0.66964286 0.74107143]
mean value: 0.6553571428571429
key: train_roc_auc
value: [0.74264706 0.74264706 0.74435209 0.73667945 0.75873828 0.72964621
0.71547315 0.70822677 0.72261296 0.71557971]
mean value: 0.7316602728047741
key: test_jcc
value: [0.45454545 0.4 0.36363636 0.5 0.55555556 0.46153846
0.5 0.6 0.5 0.55555556]
mean value: 0.4890831390831391
key: train_jcc
value: [0.60674157 0.61956522 0.60227273 0.60869565 0.62921348 0.58888889
0.56179775 0.55555556 0.56321839 0.56666667]
mean value: 0.5902615907742418
MCC on Blind test: 0.1
Accuracy on Blind test: 0.58
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00691152 0.00645924 0.00683975 0.00714922 0.00726128 0.00734305
0.00735426 0.00752354 0.00730586 0.00730991]
mean value: 0.0071457624435424805
key: score_time
value: [0.00938153 0.00884461 0.00903273 0.0093987 0.00931835 0.00955343
0.00963545 0.0096395 0.00957155 0.00971961]
mean value: 0.009409546852111816
key: test_mcc
value: [ 0.62994079 0.5 0.49099025 0.6000992 0.49099025 0.32732684
-0.02620712 0.46428571 0.32732684 0.32732684]
mean value: 0.4132079591989289
key: train_mcc
value: [0.69731096 0.6918501 0.75815907 0.66971076 0.69510727 0.70910029
0.6523446 0.71313464 0.68163703 0.66616982]
mean value: 0.6934524542628495
key: test_accuracy
value: [0.8125 0.75 0.73333333 0.8 0.73333333 0.66666667
0.46666667 0.73333333 0.66666667 0.66666667]
mean value: 0.7029166666666666
key: train_accuracy
value: [0.84558824 0.84558824 0.87591241 0.83211679 0.84671533 0.8540146
0.82481752 0.8540146 0.83941606 0.83211679]
mean value: 0.8450300558179475
key: test_fscore
value: [0.82352941 0.75 0.75 0.76923077 0.75 0.61538462
0.2 0.75 0.70588235 0.70588235]
mean value: 0.6819909502262443
key: train_fscore
value: [0.85517241 0.84892086 0.88435374 0.84353741 0.85314685 0.85915493
0.83098592 0.86111111 0.84507042 0.83687943]
mean value: 0.8518333098052753
key: test_precision
value: [0.77777778 0.75 0.66666667 0.83333333 0.66666667 0.66666667
0.5 0.75 0.66666667 0.66666667]
mean value: 0.6944444444444444
key: train_precision
value: [0.80519481 0.83098592 0.83333333 0.79487179 0.82432432 0.83561644
0.7972973 0.81578947 0.81081081 0.80821918]
mean value: 0.815644337144789
key: test_recall
value: [0.875 0.75 0.85714286 0.71428571 0.85714286 0.57142857
0.125 0.75 0.75 0.75 ]
mean value: 0.7
key: train_recall
value: [0.91176471 0.86764706 0.94202899 0.89855072 0.88405797 0.88405797
0.86764706 0.91176471 0.88235294 0.86764706]
mean value: 0.8917519181585678
key: test_roc_auc
value: [0.8125 0.75 0.74107143 0.79464286 0.74107143 0.66071429
0.49107143 0.73214286 0.66071429 0.66071429]
mean value: 0.7044642857142858
key: train_roc_auc
value: [0.84558824 0.84558824 0.87542626 0.8316283 0.84644075 0.85379369
0.82512788 0.85443308 0.8397272 0.83237425]
mean value: 0.8450127877237852
key: test_jcc
value: [0.7 0.6 0.6 0.625 0.6 0.44444444
0.11111111 0.6 0.54545455 0.54545455]
mean value: 0.5371464646464646
key: train_jcc
value: [0.74698795 0.7375 0.79268293 0.72941176 0.74390244 0.75308642
0.71084337 0.75609756 0.73170732 0.7195122 ]
mean value: 0.7421731948784563
MCC on Blind test: 0.06
Accuracy on Blind test: 0.68
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00869989 0.00870991 0.00899339 0.00884175 0.00880337 0.00869131
0.00819468 0.00880289 0.00880551 0.00904202]
mean value: 0.008758473396301269
key: score_time
value: [0.00901675 0.00891495 0.00872993 0.00879502 0.00886464 0.00874734
0.00867009 0.00894928 0.00889111 0.00890279]
mean value: 0.008848190307617188
key: test_mcc
value: [0.75 0.62994079 0.49099025 0.76376262 0.60714286 0.60714286
0.76376262 0.60714286 0.75592895 0.73214286]
mean value: 0.6707956647621525
key: train_mcc
value: [0.85294118 0.88235294 0.86868474 0.81027501 0.8687127 0.85434012
0.89869927 0.89869927 0.89863497 0.85440207]
mean value: 0.8687742253690149
key: test_accuracy
value: [0.875 0.8125 0.73333333 0.86666667 0.8 0.8
0.86666667 0.8 0.86666667 0.86666667]
mean value: 0.82875
key: train_accuracy
value: [0.92647059 0.94117647 0.93430657 0.90510949 0.93430657 0.9270073
0.94890511 0.94890511 0.94890511 0.9270073 ]
mean value: 0.9342099613568055
key: test_fscore
value: [0.875 0.8 0.75 0.875 0.8 0.8
0.85714286 0.8 0.88888889 0.875 ]
mean value: 0.8321031746031746
key: train_fscore
value: [0.92647059 0.94117647 0.9352518 0.90647482 0.93430657 0.92857143
0.94964029 0.94964029 0.94736842 0.92753623]
mean value: 0.9346436903919317
key: test_precision
value: [0.875 0.85714286 0.66666667 0.77777778 0.75 0.75
1. 0.85714286 0.8 0.875 ]
mean value: 0.8208730158730159
key: train_precision
value: [0.92647059 0.94117647 0.92857143 0.9 0.94117647 0.91549296
0.92957746 0.92957746 0.96923077 0.91428571]
mean value: 0.929555932882362
key: test_recall
value: [0.875 0.75 0.85714286 1. 0.85714286 0.85714286
0.75 0.75 1. 0.875 ]
mean value: 0.8571428571428571
key: train_recall
value: [0.92647059 0.94117647 0.94202899 0.91304348 0.92753623 0.94202899
0.97058824 0.97058824 0.92647059 0.94117647]
mean value: 0.9401108269394715
key: test_roc_auc
value: [0.875 0.8125 0.74107143 0.875 0.80357143 0.80357143
0.875 0.80357143 0.85714286 0.86607143]
mean value: 0.83125
key: train_roc_auc
value: [0.92647059 0.94117647 0.93424979 0.90505115 0.93435635 0.92689685
0.94906223 0.94906223 0.94874254 0.92710997]
mean value: 0.9342178175618073
key: test_jcc
value: [0.77777778 0.66666667 0.6 0.77777778 0.66666667 0.66666667
0.75 0.66666667 0.8 0.77777778]
mean value: 0.715
key: train_jcc
value: [0.8630137 0.88888889 0.87837838 0.82894737 0.87671233 0.86666667
0.90410959 0.90410959 0.9 0.86486486]
mean value: 0.8775691372699304
MCC on Blind test: 0.13
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.47774863 0.47219229 0.59577179 0.48569918 0.47681928 0.47980237
0.52681231 0.59629416 0.46796393 0.48238492]
mean value: 0.506148886680603
key: score_time
value: [0.01098704 0.01345611 0.01107907 0.01333833 0.01326942 0.01334047
0.01134348 0.01388001 0.01111221 0.01353312]
mean value: 0.012533926963806152
key: test_mcc
value: [1. 0.77459667 0.37796447 0.60714286 0.76376262 0.60714286
0.46428571 0.60714286 0.75592895 0.73214286]
mean value: 0.6690109846952281
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.875 0.66666667 0.8 0.86666667 0.8
0.73333333 0.8 0.86666667 0.86666667]
mean value: 0.8275
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.88888889 0.70588235 0.8 0.875 0.8
0.75 0.8 0.88888889 0.875 ]
mean value: 0.8383660130718954
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.8 0.6 0.75 0.77777778 0.75
0.75 0.85714286 0.8 0.875 ]
mean value: 0.7959920634920635
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.85714286 0.85714286 1. 0.85714286
0.75 0.75 1. 0.875 ]
mean value: 0.8946428571428571
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.875 0.67857143 0.80357143 0.875 0.80357143
0.73214286 0.80357143 0.85714286 0.86607143]
mean value: 0.8294642857142858
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.8 0.54545455 0.66666667 0.77777778 0.66666667
0.6 0.66666667 0.8 0.77777778]
mean value: 0.7301010101010101
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01036239 0.00926948 0.00805068 0.00823069 0.00814795 0.00783706
0.00812101 0.00809956 0.00801969 0.00834656]
mean value: 0.008448505401611328
key: score_time
value: [0.01827431 0.00897479 0.00929928 0.00881314 0.00856185 0.00861549
0.00854754 0.00869775 0.00881147 0.00856495]
mean value: 0.009716057777404785
key: test_mcc
value: [1. 0.8819171 1. 1. 0.875 0.87287156
0.87287156 0.75592895 0.87287156 0.875 ]
mean value: 0.9006460732538559
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9375 1. 1. 0.93333333 0.93333333
0.93333333 0.86666667 0.93333333 0.93333333]
mean value: 0.9470833333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94117647 1. 1. 0.93333333 0.92307692
0.94117647 0.88888889 0.94117647 0.93333333]
mean value: 0.9502161890397185
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 1. 1. 0.875 1.
0.88888889 0.8 0.88888889 1. ]
mean value: 0.9341666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9732142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9375 1. 1. 0.9375 0.92857143
0.92857143 0.85714286 0.92857143 0.9375 ]
mean value: 0.9455357142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.88888889 1. 1. 0.875 0.85714286
0.88888889 0.8 0.88888889 0.875 ]
mean value: 0.9073809523809524
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.85
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.08768535 0.08900571 0.08186841 0.087538 0.08398795 0.08419561
0.083009 0.08226275 0.08182836 0.0807085 ]
mean value: 0.08420896530151367
key: score_time
value: [0.01827073 0.01797938 0.01798153 0.01794004 0.01733184 0.01720476
0.01783872 0.01791549 0.017555 0.01719594]
mean value: 0.017721343040466308
key: test_mcc
value: [1. 0.75 0.73214286 1. 0.875 0.73214286
0.60714286 0.76376262 0.87287156 0.76376262]
mean value: 0.8096825364024487
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.875 0.86666667 1. 0.93333333 0.86666667
0.8 0.86666667 0.93333333 0.86666667]
mean value: 0.9008333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.875 0.85714286 1. 0.93333333 0.85714286
0.8 0.85714286 0.94117647 0.85714286]
mean value: 0.8978081232492997
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.875 0.85714286 1. 0.875 0.85714286
0.85714286 1. 0.88888889 1. ]
mean value: 0.921031746031746
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 0.85714286 1. 1. 0.85714286
0.75 0.75 1. 0.75 ]
mean value: 0.8839285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.875 0.86607143 1. 0.9375 0.86607143
0.80357143 0.875 0.92857143 0.875 ]
mean value: 0.9026785714285714
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.77777778 0.75 1. 0.875 0.75
0.66666667 0.75 0.88888889 0.75 ]
mean value: 0.8208333333333333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.81
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00700188 0.00755453 0.00768661 0.00771928 0.00732183 0.00702024
0.00708175 0.00722766 0.00722528 0.00710702]
mean value: 0.007294607162475586
key: score_time
value: [0.00804567 0.00840735 0.0089283 0.0083468 0.0084908 0.00807238
0.00812387 0.0082643 0.00818753 0.00800323]
mean value: 0.00828702449798584
key: test_mcc
value: [0.8819171 0.8819171 0.73214286 1. 0.76376262 0.46428571
0.49099025 0.60714286 0.875 0.87287156]
mean value: 0.7570030065748747
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.9375 0.86666667 1. 0.86666667 0.73333333
0.73333333 0.8 0.93333333 0.93333333]
mean value: 0.8741666666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.93333333 0.85714286 1. 0.875 0.71428571
0.71428571 0.8 0.93333333 0.94117647]
mean value: 0.8709733893557423
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 1. 0.85714286 1. 0.77777778 0.71428571
0.83333333 0.85714286 1. 0.88888889]
mean value: 0.8817460317460317
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 0.85714286 1. 1. 0.71428571
0.625 0.75 0.875 1. ]
mean value: 0.8696428571428572
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.9375 0.86607143 1. 0.875 0.73214286
0.74107143 0.80357143 0.9375 0.92857143]
mean value: 0.8758928571428571
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.875 0.75 1. 0.77777778 0.55555556
0.55555556 0.66666667 0.875 0.88888889]
mean value: 0.7833333333333333
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.73
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.0097611 1.00532484 1.01266861 1.00928307 1.0162096 1.00374866
1.01397824 1.01551723 1.02490425 1.04691529]
mean value: 1.0158310890197755
key: score_time
value: [0.15017748 0.09301543 0.09229612 0.09591055 0.09012294 0.09085989
0.09002423 0.09411788 0.09723639 0.09498525]
mean value: 0.09887461662292481
key: test_mcc
value: [1. 0.8819171 0.76376262 1. 0.875 0.73214286
0.60714286 0.73214286 0.87287156 0.875 ]
mean value: 0.8339979851886711
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9375 0.86666667 1. 0.93333333 0.86666667
0.8 0.86666667 0.93333333 0.93333333]
mean value: 0.9137500000000001
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94117647 0.875 1. 0.93333333 0.85714286
0.8 0.875 0.94117647 0.93333333]
mean value: 0.9156162464985994
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 0.77777778 1. 0.875 0.85714286
0.85714286 0.875 0.88888889 1. ]
mean value: 0.901984126984127
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
0.75 0.875 1. 0.875 ]
mean value: 0.9357142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9375 0.875 1. 0.9375 0.86607143
0.80357143 0.86607143 0.92857143 0.9375 ]
mean value: 0.9151785714285714
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.88888889 0.77777778 1. 0.875 0.75
0.66666667 0.77777778 0.88888889 0.875 ]
mean value: 0.85
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.83
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.84915781 0.96325994 0.88296103 0.89262009 0.85590529 0.8617866
0.86219358 0.88421559 0.87171268 0.90677834]
mean value: 0.8830590963363647
key: score_time
value: [0.23183656 0.20871425 0.23059011 0.22569108 0.22029448 0.24542952
0.22994447 0.24367285 0.24643469 0.23650432]
mean value: 0.23191123008728026
key: test_mcc
value: [1. 0.75 0.76376262 1. 0.73214286 0.60714286
0.60714286 0.73214286 0.87287156 0.875 ]
mean value: 0.794020560534137
key: train_mcc
value: [0.98540068 0.94117647 0.98550418 0.97120941 0.94160273 0.98550418
0.98550725 0.98550725 0.97122151 0.97122151]
mean value: 0.9723855158091337
key: test_accuracy
value: [1. 0.875 0.86666667 1. 0.86666667 0.8
0.8 0.86666667 0.93333333 0.93333333]
mean value: 0.8941666666666667
key: train_accuracy
value: [0.99264706 0.97058824 0.99270073 0.98540146 0.97080292 0.99270073
0.99270073 0.99270073 0.98540146 0.98540146]
mean value: 0.986104551309575
key: test_fscore
value: [1. 0.875 0.875 1. 0.85714286 0.8
0.8 0.875 0.94117647 0.93333333]
mean value: 0.8956652661064426
key: train_fscore
value: [0.99270073 0.97058824 0.99280576 0.98571429 0.97101449 0.99280576
0.99270073 0.99270073 0.98550725 0.98550725]
mean value: 0.9862045207088039
key: test_precision
value: [1. 0.875 0.77777778 1. 0.85714286 0.75
0.85714286 0.875 0.88888889 1. ]
mean value: 0.888095238095238
key: train_precision
value: [0.98550725 0.97058824 0.98571429 0.97183099 0.97101449 0.98571429
0.98550725 0.98550725 0.97142857 0.97142857]
mean value: 0.9784241167379383
key: test_recall
value: [1. 0.875 1. 1. 0.85714286 0.85714286
0.75 0.875 1. 0.875 ]
mean value: 0.9089285714285714
key: train_recall
value: [1. 0.97058824 1. 1. 0.97101449 1.
1. 1. 1. 1. ]
mean value: 0.994160272804774
key: test_roc_auc
value: [1. 0.875 0.875 1. 0.86607143 0.80357143
0.80357143 0.86607143 0.92857143 0.9375 ]
mean value: 0.8955357142857143
key: train_roc_auc
value: [0.99264706 0.97058824 0.99264706 0.98529412 0.97080136 0.99264706
0.99275362 0.99275362 0.98550725 0.98550725]
mean value: 0.986114663256607
key: test_jcc
value: [1. 0.77777778 0.77777778 1. 0.75 0.66666667
0.66666667 0.77777778 0.88888889 0.875 ]
mean value: 0.8180555555555555
key: train_jcc
value: [0.98550725 0.94285714 0.98571429 0.97183099 0.94366197 0.98571429
0.98550725 0.98550725 0.97142857 0.97142857]
mean value: 0.9729157554019771
MCC on Blind test: 0.1
Accuracy on Blind test: 0.8
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01659274 0.00693059 0.00675678 0.0067389 0.00669909 0.00674319
0.00680494 0.00672269 0.00675416 0.00671124]
mean value: 0.007745432853698731
key: score_time
value: [0.01080561 0.00839496 0.00837088 0.00777602 0.00775194 0.00776267
0.00777245 0.00776482 0.00776577 0.00774288]
mean value: 0.008190798759460449
key: test_mcc
value: [0.25 0.25819889 0.07142857 0.33928571 0.46428571 0.13363062
0.33928571 0.46428571 0.33928571 0.49099025]
mean value: 0.3150676906591499
key: train_mcc
value: [0.48788604 0.49441323 0.48933032 0.47900717 0.52059257 0.46076782
0.4312221 0.41698711 0.44522592 0.43208129]
mean value: 0.46575135687893415
key: test_accuracy
value: [0.625 0.625 0.53333333 0.66666667 0.73333333 0.53333333
0.66666667 0.73333333 0.66666667 0.73333333]
mean value: 0.6516666666666666
key: train_accuracy
value: [0.74264706 0.74264706 0.74452555 0.73722628 0.75912409 0.72992701
0.71532847 0.7080292 0.72262774 0.71532847]
mean value: 0.7317410905968227
key: test_fscore
value: [0.625 0.57142857 0.53333333 0.66666667 0.71428571 0.63157895
0.66666667 0.75 0.66666667 0.71428571]
mean value: 0.6539912280701754
key: train_fscore
value: [0.75524476 0.76510067 0.75177305 0.75675676 0.77241379 0.74125874
0.71942446 0.71428571 0.72058824 0.72340426]
mean value: 0.7420250432480666
key: test_precision
value: [0.625 0.66666667 0.5 0.625 0.71428571 0.5
0.71428571 0.75 0.71428571 0.83333333]
mean value: 0.6642857142857143
key: train_precision
value: [0.72 0.7037037 0.73611111 0.70886076 0.73684211 0.71621622
0.70422535 0.69444444 0.72058824 0.69863014]
mean value: 0.7139622064625399
key: test_recall
value: [0.625 0.5 0.57142857 0.71428571 0.71428571 0.85714286
0.625 0.75 0.625 0.625 ]
mean value: 0.6607142857142857
key: train_recall
value: [0.79411765 0.83823529 0.76811594 0.8115942 0.8115942 0.76811594
0.73529412 0.73529412 0.72058824 0.75 ]
mean value: 0.7732949701619778
key: test_roc_auc
value: [0.625 0.625 0.53571429 0.66964286 0.73214286 0.55357143
0.66964286 0.73214286 0.66964286 0.74107143]
mean value: 0.6553571428571429
key: train_roc_auc
value: [0.74264706 0.74264706 0.74435209 0.73667945 0.75873828 0.72964621
0.71547315 0.70822677 0.72261296 0.71557971]
mean value: 0.7316602728047741
key: test_jcc
value: [0.45454545 0.4 0.36363636 0.5 0.55555556 0.46153846
0.5 0.6 0.5 0.55555556]
mean value: 0.4890831390831391
key: train_jcc
value: [0.60674157 0.61956522 0.60227273 0.60869565 0.62921348 0.58888889
0.56179775 0.55555556 0.56321839 0.56666667]
mean value: 0.5902615907742418
MCC on Blind test: 0.1
Accuracy on Blind test: 0.58
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.05043268 0.03555036 0.05953455 0.03447628 0.03820968 0.0348382
0.03491855 0.03489041 0.03481722 0.03513861]
mean value: 0.03928065299987793
key: score_time
value: [0.01032662 0.01029825 0.0103786 0.01031733 0.01061702 0.01034212
0.01034379 0.01036835 0.01033378 0.01031613]
mean value: 0.010364198684692382
key: test_mcc
value: [1. 0.8819171 1. 1. 0.875 1.
0.87287156 1. 0.87287156 0.875 ]
mean value: 0.9377660225576135
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9375 1. 1. 0.93333333 1.
0.93333333 1. 0.93333333 0.93333333]
mean value: 0.9670833333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94117647 1. 1. 0.93333333 1.
0.94117647 1. 0.94117647 0.93333333]
mean value: 0.9690196078431372
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 1. 1. 0.875 1.
0.88888889 1. 0.88888889 1. ]
mean value: 0.9541666666666666
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.875]
mean value: 0.9875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9375 1. 1. 0.9375 1.
0.92857143 1. 0.92857143 0.9375 ]
mean value: 0.9669642857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.88888889 1. 1. 0.875 1.
0.88888889 1. 0.88888889 0.875 ]
mean value: 0.9416666666666667
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.84
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01003599 0.01160932 0.01194644 0.01213264 0.0120163 0.01201224
0.01192141 0.01221108 0.01222897 0.01220608]
mean value: 0.011832046508789062
key: score_time
value: [0.01034403 0.01014495 0.01055765 0.01064253 0.01057649 0.01061487
0.01058221 0.01066971 0.01062822 0.01060128]
mean value: 0.01053619384765625
key: test_mcc
value: [0.8819171 0.77459667 0.49099025 1. 0.73214286 0.73214286
0.87287156 0.76376262 0.75592895 0.75592895]
mean value: 0.7760281809053229
key: train_mcc
value: [0.89949371 0.91533482 0.90246052 0.91392776 0.92787101 0.95710706
0.9139999 0.92951942 0.92791659 0.92951942]
mean value: 0.9217150203470457
key: test_accuracy
value: [0.9375 0.875 0.73333333 1. 0.86666667 0.86666667
0.93333333 0.86666667 0.86666667 0.86666667]
mean value: 0.88125
key: train_accuracy
value: [0.94852941 0.95588235 0.94890511 0.95620438 0.96350365 0.97810219
0.95620438 0.96350365 0.96350365 0.96350365]
mean value: 0.9597842421640189
key: test_fscore
value: [0.94117647 0.88888889 0.75 1. 0.85714286 0.85714286
0.94117647 0.85714286 0.88888889 0.88888889]
mean value: 0.8870448179271708
key: train_fscore
value: [0.95035461 0.95774648 0.95172414 0.95774648 0.96453901 0.9787234
0.95714286 0.96453901 0.96402878 0.96453901]
mean value: 0.961108376525978
key: test_precision
value: [0.88888889 0.8 0.66666667 1. 0.85714286 0.85714286
0.88888889 1. 0.8 0.8 ]
mean value: 0.8558730158730159
key: train_precision
value: [0.91780822 0.91891892 0.90789474 0.93150685 0.94444444 0.95833333
0.93055556 0.93150685 0.94366197 0.93150685]
mean value: 0.9316137728048631
key: test_recall
value: [1. 1. 0.85714286 1. 0.85714286 0.85714286
1. 0.75 1. 1. ]
mean value: 0.9321428571428572
key: train_recall
value: [0.98529412 1. 1. 0.98550725 0.98550725 1.
0.98529412 1. 0.98529412 1. ]
mean value: 0.99268968456948
key: test_roc_auc
value: [0.9375 0.875 0.74107143 1. 0.86607143 0.86607143
0.92857143 0.875 0.85714286 0.85714286]
mean value: 0.8803571428571428
key: train_roc_auc
value: [0.94852941 0.95588235 0.94852941 0.95598892 0.96334186 0.97794118
0.95641517 0.96376812 0.96366155 0.96376812]
mean value: 0.9597826086956522
key: test_jcc
value: [0.88888889 0.8 0.6 1. 0.75 0.75
0.88888889 0.75 0.8 0.8 ]
mean value: 0.8027777777777778
key: train_jcc
value: [0.90540541 0.91891892 0.90789474 0.91891892 0.93150685 0.95833333
0.91780822 0.93150685 0.93055556 0.93150685]
mean value: 0.9252355636097525
MCC on Blind test: 0.05
Accuracy on Blind test: 0.64
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.00948811 0.00723052 0.00743365 0.00692177 0.00683284 0.00688004
0.00742602 0.00687981 0.00724554 0.00696445]
mean value: 0.00733027458190918
key: score_time
value: [0.01050973 0.00838256 0.00804853 0.00779343 0.00792456 0.00785303
0.00853586 0.00783825 0.00832796 0.00790906]
mean value: 0.008312296867370606
key: test_mcc
value: [0.37796447 0.25819889 0.37796447 0.32732684 0.60714286 0.37796447
0.49099025 0.33928571 0.33928571 0.46428571]
mean value: 0.3960409397159814
key: train_mcc
value: [0.47243088 0.54894692 0.5182264 0.47592003 0.46076782 0.5335339
0.4599318 0.4312221 0.47473887 0.47442455]
mean value: 0.4850143267959903
key: test_accuracy
value: [0.6875 0.625 0.66666667 0.66666667 0.8 0.66666667
0.73333333 0.66666667 0.66666667 0.73333333]
mean value: 0.6912499999999999
key: train_accuracy
value: [0.73529412 0.77205882 0.75912409 0.73722628 0.72992701 0.76642336
0.72992701 0.71532847 0.73722628 0.73722628]
mean value: 0.7419761700300558
key: test_fscore
value: [0.66666667 0.57142857 0.70588235 0.61538462 0.8 0.70588235
0.71428571 0.66666667 0.66666667 0.75 ]
mean value: 0.6862863606981253
key: train_fscore
value: [0.74647887 0.7862069 0.76258993 0.75 0.74125874 0.77464789
0.72992701 0.71942446 0.73913043 0.73529412]
mean value: 0.7484958346591992
key: test_precision
value: [0.71428571 0.66666667 0.6 0.66666667 0.75 0.6
0.83333333 0.71428571 0.71428571 0.75 ]
mean value: 0.700952380952381
key: train_precision
value: [0.71621622 0.74025974 0.75714286 0.72 0.71621622 0.75342466
0.72463768 0.70422535 0.72857143 0.73529412]
mean value: 0.729598826685986
key: test_recall
value: [0.625 0.5 0.85714286 0.57142857 0.85714286 0.85714286
0.625 0.625 0.625 0.75 ]
mean value: 0.6892857142857143
key: train_recall
value: [0.77941176 0.83823529 0.76811594 0.7826087 0.76811594 0.79710145
0.73529412 0.73529412 0.75 0.73529412]
mean value: 0.7689471440750213
key: test_roc_auc
value: [0.6875 0.625 0.67857143 0.66071429 0.80357143 0.67857143
0.74107143 0.66964286 0.66964286 0.73214286]
mean value: 0.6946428571428571
key: train_roc_auc
value: [0.73529412 0.77205882 0.75905797 0.73689258 0.72964621 0.76619778
0.7299659 0.71547315 0.73731884 0.73721228]
mean value: 0.7419117647058824
key: test_jcc
value: [0.5 0.4 0.54545455 0.44444444 0.66666667 0.54545455
0.55555556 0.5 0.5 0.6 ]
mean value: 0.5257575757575758
key: train_jcc
value: [0.59550562 0.64772727 0.61627907 0.6 0.58888889 0.63218391
0.57471264 0.56179775 0.5862069 0.58139535]
mean value: 0.5984697399283192
MCC on Blind test: 0.1
Accuracy on Blind test: 0.6
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00826144 0.00782967 0.00737524 0.00810552 0.00831509 0.00756001
0.00788832 0.00767446 0.00802422 0.00808358]
mean value: 0.007911753654479981
key: score_time
value: [0.00887156 0.00873542 0.00787902 0.00805974 0.00852108 0.00807381
0.00853586 0.00868702 0.00864601 0.00846553]
mean value: 0.008447504043579102
key: test_mcc
value: [0.77459667 0.5 0.47245559 0.64465837 0.73214286 0.60714286
0.64465837 0.87287156 0.64465837 0.6000992 ]
mean value: 0.6493283847542592
key: train_mcc
value: [0.76894131 0.91334626 0.54803747 0.87326937 0.94160273 0.83757093
0.91597649 0.88476385 0.87099729 0.88476385]
mean value: 0.8439269536443883
key: test_accuracy
value: [0.875 0.75 0.73333333 0.8 0.86666667 0.8
0.8 0.93333333 0.8 0.8 ]
mean value: 0.8158333333333334
key: train_accuracy
value: [0.875 0.95588235 0.72992701 0.93430657 0.97080292 0.91240876
0.95620438 0.94160584 0.93430657 0.94160584]
mean value: 0.9152050236152856
key: test_fscore
value: [0.85714286 0.75 0.66666667 0.72727273 0.85714286 0.8
0.84210526 0.94117647 0.84210526 0.82352941]
mean value: 0.8107141516893839
key: train_fscore
value: [0.85950413 0.95454545 0.63366337 0.93129771 0.97101449 0.92
0.95774648 0.94285714 0.93617021 0.94285714]
mean value: 0.9049656133144264
key: test_precision
value: [1. 0.75 0.8 1. 0.85714286 0.75
0.72727273 0.88888889 0.72727273 0.77777778]
mean value: 0.8278354978354978
key: train_precision
value: [0.98113208 0.984375 1. 0.98387097 0.97101449 0.85185185
0.91891892 0.91666667 0.90410959 0.91666667]
mean value: 0.9428606229112457
key: test_recall
value: [0.75 0.75 0.57142857 0.57142857 0.85714286 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.8232142857142857
key: train_recall
value: [0.76470588 0.92647059 0.46376812 0.88405797 0.97101449 1.
1. 0.97058824 0.97058824 0.97058824]
mean value: 0.8921781756180733
key: test_roc_auc
value: [0.875 0.75 0.72321429 0.78571429 0.86607143 0.80357143
0.78571429 0.92857143 0.78571429 0.79464286]
mean value: 0.8098214285714286
key: train_roc_auc
value: [0.875 0.95588235 0.73188406 0.93467604 0.97080136 0.91176471
0.95652174 0.94181586 0.93456948 0.94181586]
mean value: 0.9154731457800511
key: test_jcc
value: [0.75 0.6 0.5 0.57142857 0.75 0.66666667
0.72727273 0.88888889 0.72727273 0.7 ]
mean value: 0.6881529581529582
key: train_jcc
value: [0.75362319 0.91304348 0.46376812 0.87142857 0.94366197 0.85185185
0.91891892 0.89189189 0.88 0.89189189]
mean value: 0.8380079880422807
MCC on Blind test: 0.06
Accuracy on Blind test: 0.89
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00997114 0.01002455 0.00783682 0.00783896 0.00782728 0.00736952
0.00789356 0.00743914 0.00764585 0.0072844 ]
mean value: 0.00811312198638916
key: score_time
value: [0.01067495 0.00957513 0.00809073 0.00825882 0.0082314 0.00791526
0.00799417 0.00835299 0.00847554 0.00832438]
mean value: 0.008589339256286622
key: test_mcc
value: [0.77459667 0.37796447 0.36689969 0.60714286 0.49099025 0.73214286
0.6000992 0.73214286 0.75592895 0.73214286]
mean value: 0.6170050660873226
key: train_mcc
value: [0.72669793 0.88580789 0.78788403 0.74493056 0.77817796 0.91597649
0.92951942 0.85434012 0.86000692 0.91240409]
mean value: 0.8395745411348854
key: test_accuracy
value: [0.875 0.6875 0.6 0.8 0.73333333 0.86666667
0.8 0.86666667 0.86666667 0.86666667]
mean value: 0.79625
key: train_accuracy
value: [0.84558824 0.94117647 0.88321168 0.86861314 0.88321168 0.95620438
0.96350365 0.9270073 0.9270073 0.95620438]
mean value: 0.9151728209531989
key: test_fscore
value: [0.85714286 0.66666667 0.7 0.8 0.75 0.85714286
0.82352941 0.875 0.88888889 0.875 ]
mean value: 0.8093370681605976
key: train_fscore
value: [0.8173913 0.93846154 0.8961039 0.87837838 0.89333333 0.95454545
0.96453901 0.92537313 0.93055556 0.95588235]
mean value: 0.9154563955087716
key: test_precision
value: [1. 0.71428571 0.53846154 0.75 0.66666667 0.85714286
0.77777778 0.875 0.8 0.875 ]
mean value: 0.7854334554334554
key: train_precision
value: [1. 0.98387097 0.81176471 0.82278481 0.82716049 1.
0.93150685 0.93939394 0.88157895 0.95588235]
mean value: 0.9153943066596637
key: test_recall
value: [0.75 0.625 1. 0.85714286 0.85714286 0.85714286
0.875 0.875 1. 0.875 ]
mean value: 0.8571428571428571
key: train_recall
value: [0.69117647 0.89705882 1. 0.94202899 0.97101449 0.91304348
1. 0.91176471 0.98529412 0.95588235]
mean value: 0.9267263427109974
key: test_roc_auc
value: [0.875 0.6875 0.625 0.80357143 0.74107143 0.86607143
0.79464286 0.86607143 0.85714286 0.86607143]
mean value: 0.7982142857142858
key: train_roc_auc
value: [0.84558824 0.94117647 0.88235294 0.86807332 0.88256607 0.95652174
0.96376812 0.92689685 0.92742967 0.95620205]
mean value: 0.9150575447570333
key: test_jcc
value: [0.75 0.5 0.53846154 0.66666667 0.6 0.75
0.7 0.77777778 0.8 0.77777778]
mean value: 0.6860683760683761
key: train_jcc
value: [0.69117647 0.88405797 0.81176471 0.78313253 0.80722892 0.91304348
0.93150685 0.86111111 0.87012987 0.91549296]
mean value: 0.8468644859831611
MCC on Blind test: 0.04
Accuracy on Blind test: 0.85
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.07661414 0.06280541 0.0673337 0.06278038 0.06312895 0.06578207
0.06545353 0.06469059 0.06454372 0.06606507]
mean value: 0.06591975688934326
key: score_time
value: [0.01440525 0.01476049 0.01512003 0.01427507 0.01462126 0.01519179
0.01491117 0.01524901 0.01458573 0.01450872]
mean value: 0.01476285457611084
key: test_mcc
value: [1. 0.8819171 0.76376262 0.875 0.73214286 0.87287156
0.87287156 1. 0.87287156 0.73214286]
mean value: 0.8603580116631793
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9375 0.86666667 0.93333333 0.86666667 0.93333333
0.93333333 1. 0.93333333 0.86666667]
mean value: 0.9270833333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94117647 0.875 0.93333333 0.85714286 0.92307692
0.94117647 1. 0.94117647 0.875 ]
mean value: 0.9287082525317819
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 0.77777778 0.875 0.85714286 1.
0.88888889 1. 0.88888889 0.875 ]
mean value: 0.9051587301587302
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.85714286 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9589285714285715
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9375 0.875 0.9375 0.86607143 0.92857143
0.92857143 1. 0.92857143 0.86607143]
mean value: 0.9267857142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.88888889 0.77777778 0.875 0.75 0.85714286
0.88888889 1. 0.88888889 0.77777778]
mean value: 0.870436507936508
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.76
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.02507782 0.02816606 0.04579067 0.0453043 0.04246235 0.02354765
0.0451479 0.02337193 0.02406144 0.03010583]
mean value: 0.0333035945892334
key: score_time
value: [0.0173595 0.02051401 0.03634977 0.03514004 0.01607704 0.03413272
0.02627468 0.01672935 0.02149177 0.03459334]
mean value: 0.025866222381591798
key: test_mcc
value: [1. 0.8819171 1. 1. 0.875 0.73214286
0.87287156 1. 0.87287156 0.875 ]
mean value: 0.9109803082718992
key: train_mcc
value: [1. 1. 1. 1. 1. 1.
1. 0.98550725 1. 1. ]
mean value: 0.9985507246376811
key: test_accuracy
value: [1. 0.9375 1. 1. 0.93333333 0.86666667
0.93333333 1. 0.93333333 0.93333333]
mean value: 0.95375
key: train_accuracy
value: [1. 1. 1. 1. 1. 1.
1. 0.99270073 1. 1. ]
mean value: 0.9992700729927008
key: test_fscore
value: [1. 0.94117647 1. 1. 0.93333333 0.85714286
0.94117647 1. 0.94117647 0.93333333]
mean value: 0.954733893557423
key: train_fscore
value: [1. 1. 1. 1. 1. 1.
1. 0.99270073 1. 1. ]
mean value: 0.9992700729927008
key: test_precision
value: [1. 0.88888889 1. 1. 0.875 0.85714286
0.88888889 1. 0.88888889 1. ]
mean value: 0.9398809523809524
key: train_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.98550725 1. 1. ]
mean value: 0.9985507246376811
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9732142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9375 1. 1. 0.9375 0.86607143
0.92857143 1. 0.92857143 0.9375 ]
mean value: 0.9535714285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1.
1. 0.99275362 1. 1. ]
mean value: 0.9992753623188406
key: test_jcc
value: [1. 0.88888889 1. 1. 0.875 0.75
0.88888889 1. 0.88888889 0.875 ]
mean value: 0.9166666666666666
key: train_jcc
value: [1. 1. 1. 1. 1. 1.
1. 0.98550725 1. 1. ]
mean value: 0.9985507246376811
MCC on Blind test: 0.13
Accuracy on Blind test: 0.86
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.03250933 0.03897357 0.01707721 0.01700234 0.01710153 0.01721501
0.04007459 0.0396409 0.03977489 0.04000473]
mean value: 0.02993741035461426
key: score_time
value: [0.01946115 0.01910114 0.01107264 0.01099253 0.01101065 0.01094913
0.02085972 0.01942563 0.01113796 0.02103782]
mean value: 0.015504837036132812
key: test_mcc
value: [0.8819171 0.75 0.37796447 0.73214286 0.76376262 0.73214286
0.76376262 0.60714286 0.6000992 0.60714286]
mean value: 0.6816077435069778
key: train_mcc
value: [0.95598573 0.98540068 0.98550418 0.98550725 0.97080136 0.97080136
0.98550725 0.97080136 0.97120941 0.97080136]
mean value: 0.9752319946905791
key: test_accuracy
value: [0.9375 0.875 0.66666667 0.86666667 0.86666667 0.86666667
0.86666667 0.8 0.8 0.8 ]
mean value: 0.8345833333333333
key: train_accuracy
value: [0.97794118 0.99264706 0.99270073 0.99270073 0.98540146 0.98540146
0.99270073 0.98540146 0.98540146 0.98540146]
mean value: 0.9875697724345213
key: test_fscore
value: [0.94117647 0.875 0.70588235 0.85714286 0.875 0.85714286
0.85714286 0.8 0.82352941 0.8 ]
mean value: 0.8392016806722689
key: train_fscore
value: [0.97777778 0.99259259 0.99280576 0.99270073 0.98550725 0.98550725
0.99270073 0.98529412 0.98507463 0.98529412]
mean value: 0.9875254940533481
key: test_precision
value: [0.88888889 0.875 0.6 0.85714286 0.77777778 0.85714286
1. 0.85714286 0.77777778 0.85714286]
mean value: 0.8348015873015873
key: train_precision
value: [0.98507463 1. 0.98571429 1. 0.98550725 0.98550725
0.98550725 0.98529412 1. 0.98529412]
mean value: 0.989789888700451
key: test_recall
value: [1. 0.875 0.85714286 0.85714286 1. 0.85714286
0.75 0.75 0.875 0.75 ]
mean value: 0.8571428571428571
key: train_recall
value: [0.97058824 0.98529412 1. 0.98550725 0.98550725 0.98550725
1. 0.98529412 0.97058824 0.98529412]
mean value: 0.9853580562659847
key: test_roc_auc
value: [0.9375 0.875 0.67857143 0.86607143 0.875 0.86607143
0.875 0.80357143 0.79464286 0.80357143]
mean value: 0.8375
key: train_roc_auc
value: [0.97794118 0.99264706 0.99264706 0.99275362 0.98540068 0.98540068
0.99275362 0.98540068 0.98529412 0.98540068]
mean value: 0.9875639386189259
key: test_jcc
value: [0.88888889 0.77777778 0.54545455 0.75 0.77777778 0.75
0.75 0.66666667 0.7 0.66666667]
mean value: 0.7273232323232323
key: train_jcc
value: [0.95652174 0.98529412 0.98571429 0.98550725 0.97142857 0.97142857
0.98550725 0.97101449 0.97058824 0.97101449]
mean value: 0.9754018998903909
MCC on Blind test: 0.06
Accuracy on Blind test: 0.66
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.10531926 0.09862638 0.1012013 0.09502292 0.09023929 0.08958817
0.09062433 0.07970119 0.08648419 0.07744169]
mean value: 0.09142487049102783
key: score_time
value: [0.00943542 0.00918198 0.00938845 0.00950336 0.00970459 0.00936961
0.00830388 0.00854349 0.00833607 0.00825047]
mean value: 0.009001731872558594
key: test_mcc
value: [0.8819171 0.8819171 1. 1. 0.875 0.87287156
0.87287156 0.87287156 0.87287156 0.875 ]
mean value: 0.9005320451152271
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.9375 1. 1. 0.93333333 0.93333333
0.93333333 0.93333333 0.93333333 0.93333333]
mean value: 0.9475
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.94117647 1. 1. 0.93333333 0.92307692
0.94117647 0.94117647 0.94117647 0.93333333]
mean value: 0.9495625942684767
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.88888889 1. 1. 0.875 1.
0.88888889 0.88888889 0.88888889 1. ]
mean value: 0.9319444444444445
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9732142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.9375 1. 1. 0.9375 0.92857143
0.92857143 0.92857143 0.92857143 0.9375 ]
mean value: 0.9464285714285714
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.88888889 1. 1. 0.875 0.85714286
0.88888889 0.88888889 0.88888889 0.875 ]
mean value: 0.9051587301587302
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.83
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00895786 0.01093102 0.01075339 0.01098084 0.01611209 0.01137829
0.01158309 0.01124573 0.01122022 0.01167703]
mean value: 0.01148395538330078
key: score_time
value: [0.01024771 0.01018643 0.01022196 0.01059794 0.010957 0.01306653
0.01069069 0.01379347 0.01385617 0.01327443]
mean value: 0.011689233779907226
key: test_mcc
value: [1. 0.67419986 0.75592895 0.75592895 0.75592895 0.53452248
0.56407607 0.60714286 0.76376262 0.76376262]
mean value: 0.7175253347956024
key: train_mcc
value: [0.98540068 1. 1. 1. 1. 1.
0.87609014 1. 1. 1. ]
mean value: 0.9861490818102587
key: test_accuracy
value: [1. 0.8125 0.86666667 0.86666667 0.86666667 0.73333333
0.73333333 0.8 0.86666667 0.86666667]
mean value: 0.84125
key: train_accuracy
value: [0.99264706 1. 1. 1. 1. 1.
0.93430657 1. 1. 1. ]
mean value: 0.9926953628166595
key: test_fscore
value: [1. 0.76923077 0.83333333 0.83333333 0.83333333 0.6
0.66666667 0.8 0.85714286 0.85714286]
mean value: 0.805018315018315
key: train_fscore
value: [0.99259259 1. 1. 1. 1. 1.
0.92913386 1. 1. 1. ]
mean value: 0.9921726450860309
key: test_precision
value: [1. 1. 1. 1. 1. 1.
1. 0.85714286 1. 1. ]
mean value: 0.9857142857142858
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.625 0.71428571 0.71428571 0.71428571 0.42857143
0.5 0.75 0.75 0.75 ]
mean value: 0.6946428571428571
key: train_recall
value: [0.98529412 1. 1. 1. 1. 1.
0.86764706 1. 1. 1. ]
mean value: 0.9852941176470589
key: test_roc_auc
value: [1. 0.8125 0.85714286 0.85714286 0.85714286 0.71428571
0.75 0.80357143 0.875 0.875 ]
mean value: 0.8401785714285714
key: train_roc_auc
value: [0.99264706 1. 1. 1. 1. 1.
0.93382353 1. 1. 1. ]
mean value: 0.9926470588235294
key: test_jcc
value: [1. 0.625 0.71428571 0.71428571 0.71428571 0.42857143
0.5 0.66666667 0.75 0.75 ]
mean value: 0.6863095238095238
key: train_jcc
value: [0.98529412 1. 1. 1. 1. 1.
0.86764706 1. 1. 1. ]
mean value: 0.9852941176470589
MCC on Blind test: -0.02
Accuracy on Blind test: 0.95
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01463962 0.00981808 0.00781131 0.00768018 0.00760412 0.00761986
0.00743604 0.00750971 0.00748968 0.00746846]
mean value: 0.008507704734802246
key: score_time
value: [0.01040816 0.0082798 0.00810122 0.00809526 0.00800514 0.0080471
0.00783396 0.00794506 0.00791073 0.00810385]
mean value: 0.008273029327392578
key: test_mcc
value: [0.8819171 0.62994079 0.37796447 0.87287156 0.73214286 0.73214286
0.75592895 1. 0.75592895 0.6000992 ]
mean value: 0.7338936730461708
key: train_mcc
value: [0.82388584 0.88273483 0.85440207 0.85434012 0.89863497 0.88320546
0.90025835 0.84026462 0.88360693 0.86948194]
mean value: 0.8690815123547234
key: test_accuracy
value: [0.9375 0.8125 0.66666667 0.93333333 0.86666667 0.86666667
0.86666667 1. 0.86666667 0.8 ]
mean value: 0.8616666666666667
key: train_accuracy
value: [0.91176471 0.94117647 0.9270073 0.9270073 0.94890511 0.94160584
0.94890511 0.91970803 0.94160584 0.93430657]
mean value: 0.9341992271361099
key: test_fscore
value: [0.93333333 0.82352941 0.70588235 0.92307692 0.85714286 0.85714286
0.88888889 1. 0.88888889 0.82352941]
mean value: 0.8701414924944337
key: train_fscore
value: [0.91304348 0.94202899 0.92647059 0.92857143 0.95035461 0.94202899
0.95035461 0.92086331 0.94202899 0.9352518 ]
mean value: 0.9350996779361157
key: test_precision
value: [1. 0.77777778 0.6 1. 0.85714286 0.85714286
0.8 1. 0.8 0.77777778]
mean value: 0.846984126984127
key: train_precision
value: [0.9 0.92857143 0.94029851 0.91549296 0.93055556 0.94202899
0.91780822 0.90140845 0.92857143 0.91549296]
mean value: 0.9220228491043612
key: test_recall
value: [0.875 0.875 0.85714286 0.85714286 0.85714286 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9053571428571429
key: train_recall
value: [0.92647059 0.95588235 0.91304348 0.94202899 0.97101449 0.94202899
0.98529412 0.94117647 0.95588235 0.95588235]
mean value: 0.9488704177323103
key: test_roc_auc
value: [0.9375 0.8125 0.67857143 0.92857143 0.86607143 0.86607143
0.85714286 1. 0.85714286 0.79464286]
mean value: 0.8598214285714286
key: train_roc_auc
value: [0.91176471 0.94117647 0.92710997 0.92689685 0.94874254 0.94160273
0.9491688 0.9198636 0.94170929 0.93446292]
mean value: 0.9342497868712702
key: test_jcc
value: [0.875 0.7 0.54545455 0.85714286 0.75 0.75
0.8 1. 0.8 0.7 ]
mean value: 0.7777597402597403
key: train_jcc
value: [0.84 0.89041096 0.8630137 0.86666667 0.90540541 0.89041096
0.90540541 0.85333333 0.89041096 0.87837838]
mean value: 0.8783435764531655
MCC on Blind test: 0.07
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.07311392 0.06029248 0.06152678 0.05939054 0.05958748 0.05935431
0.05937529 0.05921316 0.0613966 0.06374431]
mean value: 0.06169948577880859
key: score_time
value: [0.00807023 0.00803971 0.00803089 0.00806856 0.00811672 0.00805664
0.00809479 0.00807714 0.00883532 0.0086627 ]
mean value: 0.008205270767211914
key: test_mcc
value: [0.8819171 0.62994079 0.49099025 0.87287156 0.73214286 0.73214286
0.75592895 1. 0.75592895 0.6000992 ]
mean value: 0.7451962510483463
key: train_mcc
value: [0.85442069 0.87000211 0.89863497 0.85434012 0.92787101 0.91277477
0.90025835 0.8555278 0.88360693 0.88668406]
mean value: 0.8844120809526788
key: test_accuracy
value: [0.9375 0.8125 0.73333333 0.93333333 0.86666667 0.86666667
0.86666667 1. 0.86666667 0.8 ]
mean value: 0.8683333333333334
key: train_accuracy
value: [0.92647059 0.93382353 0.94890511 0.9270073 0.96350365 0.95620438
0.94890511 0.9270073 0.94160584 0.94160584]
mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:163: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:166: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.9415038643194504
key: test_fscore
value: [0.94117647 0.82352941 0.75 0.92307692 0.85714286 0.85714286
0.88888889 1. 0.88888889 0.82352941]
mean value: 0.8753375709258062
key: train_fscore
value: [0.92857143 0.93617021 0.95035461 0.92857143 0.96453901 0.95714286
0.95035461 0.92857143 0.94202899 0.94366197]
mean value: 0.9429966539911687
key: test_precision
value: [0.88888889 0.77777778 0.66666667 1. 0.85714286 0.85714286
0.8 1. 0.8 0.77777778]
mean value: 0.8425396825396825
key: train_precision
value: [0.90277778 0.90410959 0.93055556 0.91549296 0.94444444 0.94366197
0.91780822 0.90277778 0.92857143 0.90540541]
mean value: 0.9195605127329032
key: test_recall
value: [1. 0.875 0.85714286 0.85714286 0.85714286 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9178571428571428
key: train_recall
value: [0.95588235 0.97058824 0.97101449 0.94202899 0.98550725 0.97101449
0.98529412 0.95588235 0.95588235 0.98529412]
mean value: 0.9678388746803069
key: test_roc_auc
value: [0.9375 0.8125 0.74107143 0.92857143 0.86607143 0.86607143
0.85714286 1. 0.85714286 0.79464286]
mean value: 0.8660714285714286
key: train_roc_auc
value: [0.92647059 0.93382353 0.94874254 0.92689685 0.96334186 0.95609548
0.9491688 0.92721654 0.94170929 0.94192242]
mean value: 0.941538789428815
key: test_jcc
value: [0.88888889 0.7 0.6 0.85714286 0.75 0.75
0.8 1. 0.8 0.7 ]
mean value: 0.7846031746031746
key: train_jcc
value: [0.86666667 0.88 0.90540541 0.86666667 0.93150685 0.91780822
0.90540541 0.86666667 0.89041096 0.89333333]
mean value: 0.8923870171541405
MCC on Blind test: 0.06
Accuracy on Blind test: 0.66
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.01548505 0.01318932 0.01122952 0.01181602 0.01175117 0.01169848
0.01112366 0.01132369 0.0109632 0.01237416]
mean value: 0.012095427513122559
key: score_time
value: [0.01040673 0.00819731 0.0085001 0.00783849 0.00782514 0.00842071
0.0079546 0.00785255 0.00782204 0.00846028]
mean value: 0.008327794075012208
key: test_mcc
value: [0.35 0.35 0.8 1. 0.79056942 0.8
0.5 0.5 0.25819889 1. ]
mean value: 0.6348768304789256
key: train_mcc
value: [0.87044534 0.87035806 0.87044534 0.81836616 0.81836616 0.84412955
0.84615385 0.84615385 0.84615385 0.84615385]
mean value: 0.8476726003234742
key: test_accuracy
value: [0.66666667 0.66666667 0.88888889 1. 0.88888889 0.88888889
0.75 0.75 0.625 1. ]
mean value: 0.8125
key: train_accuracy
value: [0.93506494 0.93506494 0.93506494 0.90909091 0.90909091 0.92207792
0.92307692 0.92307692 0.92307692 0.92307692]
mean value: 0.9237762237762238
key: test_fscore
value: [0.66666667 0.66666667 0.88888889 1. 0.90909091 0.88888889
0.75 0.75 0.57142857 1. ]
mean value: 0.8091630591630592
key: train_fscore
value: [0.93506494 0.93670886 0.93506494 0.90666667 0.90666667 0.92105263
0.92307692 0.92307692 0.92307692 0.92307692]
mean value: 0.9233532388109337
key: test_precision
value: [0.6 0.6 0.8 1. 0.83333333 1.
0.75 0.75 0.66666667 1. ]
mean value: 0.8
key: train_precision
value: [0.94736842 0.925 0.94736842 0.91891892 0.91891892 0.92105263
0.92307692 0.92307692 0.92307692 0.92307692]
mean value: 0.9270935003829741
key: test_recall
value: [0.75 0.75 1. 1. 1. 0.8 0.75 0.75 0.5 1. ]
mean value: 0.83
key: train_recall
value: [0.92307692 0.94871795 0.92307692 0.89473684 0.89473684 0.92105263
0.92307692 0.92307692 0.92307692 0.92307692]
mean value: 0.9197705802968961
key: test_roc_auc
value: [0.675 0.675 0.9 1. 0.875 0.9 0.75 0.75 0.625 1. ]
mean value: 0.8150000000000001
key: train_roc_auc
value: [0.93522267 0.93488529 0.93522267 0.90890688 0.90890688 0.92206478
0.92307692 0.92307692 0.92307692 0.92307692]
mean value: 0.9237516869095818
key: test_jcc
value: [0.5 0.5 0.8 1. 0.83333333 0.8
0.6 0.6 0.4 1. ]
mean value: 0.7033333333333334
key: train_jcc
value: [0.87804878 0.88095238 0.87804878 0.82926829 0.82926829 0.85365854
0.85714286 0.85714286 0.85714286 0.85714286]
mean value: 0.8577816492450638
MCC on Blind test: 0.1
Accuracy on Blind test: 0.57
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.28448343 0.26222992 0.30907059 0.30741835 0.29078984 0.30970097
0.2858026 0.2821455 0.30858493 0.30905151]
mean value: 0.2949277639389038
key: score_time
value: [0.00840044 0.00826621 0.00892925 0.00815058 0.00974989 0.00875688
0.00868368 0.00955057 0.00915575 0.00842237]
mean value: 0.008806562423706055
key: test_mcc
value: [0.1 0.35 0.8 0.79056942 1. 1.
1. 0.5 0.57735027 1. ]
mean value: 0.711791968423172
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.55555556 0.66666667 0.88888889 0.88888889 1. 1.
1. 0.75 0.75 1. ]
mean value: 0.85
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.5 0.66666667 0.88888889 0.90909091 1. 1.
1. 0.75 0.66666667 1. ]
mean value: 0.8381313131313131
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.6 0.8 0.83333333 1. 1.
1. 0.75 1. 1. ]
mean value: 0.8483333333333334
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.75 1. 1. 1. 1. 1. 0.75 0.5 1. ]
mean value: 0.85
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.55 0.675 0.9 0.875 1. 1. 1. 0.75 0.75 1. ]
mean value: 0.85
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.33333333 0.5 0.8 0.83333333 1. 1.
1. 0.6 0.5 1. ]
mean value: 0.7566666666666667
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.63
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0089488 0.00875568 0.0074203 0.00725079 0.00705767 0.00712848
0.00707436 0.00722599 0.00702119 0.00702357]
mean value: 0.007490682601928711
key: score_time
value: [0.0106101 0.01032877 0.00850701 0.00852227 0.0084753 0.00805879
0.00836706 0.00811815 0.00836396 0.00837588]
mean value: 0.008772730827331543
key: test_mcc
value: [ 0.39528471 0.5976143 0.5976143 0.47809144 0.15811388 0.35
0.25819889 0.37796447 0.25819889 -0.25819889]
mean value: 0.3212882006354006
key: train_mcc
value: [0.54521744 0.52542209 0.52542209 0.53924899 0.54085245 0.53924899
0.52790958 0.54772256 0.58722022 0.60697698]
mean value: 0.5485241391056351
key: test_accuracy
value: [0.66666667 0.77777778 0.77777778 0.66666667 0.55555556 0.66666667
0.625 0.625 0.625 0.375 ]
mean value: 0.6361111111111111
key: train_accuracy
value: [0.72727273 0.71428571 0.71428571 0.72727273 0.74025974 0.72727273
0.71794872 0.73076923 0.75641026 0.76923077]
mean value: 0.7325008325008325
key: test_fscore
value: [0.4 0.66666667 0.66666667 0.57142857 0.5 0.66666667
0.57142857 0.4 0.57142857 0.28571429]
mean value: 0.53
key: train_fscore
value: [0.63157895 0.60714286 0.60714286 0.61818182 0.65517241 0.61818182
0.60714286 0.63157895 0.6779661 0.7 ]
mean value: 0.6354088618017069
key: test_precision
value: [1. 1. 1. 1. 0.66666667 0.75
0.66666667 1. 0.66666667 0.33333333]
mean value: 0.8083333333333333
key: train_precision
value: [1. 1. 1. 1. 0.95 1. 1. 1. 1. 1. ]
mean value: 0.995
key: test_recall
value: [0.25 0.5 0.5 0.4 0.4 0.6 0.5 0.25 0.5 0.25]
mean value: 0.415
key: train_recall
value: [0.46153846 0.43589744 0.43589744 0.44736842 0.5 0.44736842
0.43589744 0.46153846 0.51282051 0.53846154]
mean value: 0.4676788124156545
key: test_roc_auc
value: [0.625 0.75 0.75 0.7 0.575 0.675 0.625 0.625 0.625 0.375]
mean value: 0.6325
key: train_roc_auc
value: [0.73076923 0.71794872 0.71794872 0.72368421 0.73717949 0.72368421
0.71794872 0.73076923 0.75641026 0.76923077]
mean value: 0.732557354925776
key: test_jcc
value: [0.25 0.5 0.5 0.4 0.33333333 0.5
0.4 0.25 0.4 0.16666667]
mean value: 0.37
key: train_jcc
value: [0.46153846 0.43589744 0.43589744 0.44736842 0.48717949 0.44736842
0.43589744 0.46153846 0.51282051 0.53846154]
mean value: 0.46639676113360323
MCC on Blind test: 0.08
Accuracy on Blind test: 0.73
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00678349 0.00722671 0.00688362 0.00727487 0.00721765 0.00729203
0.00733399 0.00746918 0.00723958 0.00735378]
mean value: 0.007207489013671875
key: score_time
value: [0.00807953 0.00803757 0.00839972 0.00815082 0.00826144 0.0078733
0.00901914 0.0084033 0.00845194 0.00837016]
mean value: 0.008304691314697266
key: test_mcc
value: [-0.31622777 0.63245553 0.63245553 0.15811388 0.31622777 0.35
0.57735027 0.77459667 -0.25819889 0. ]
mean value: 0.2866772995759719
key: train_mcc
value: [0.53279352 0.50745677 0.5064147 0.45639039 0.53591229 0.42943967
0.51298918 0.46537892 0.59684919 0.41367015]
mean value: 0.49572947743384127
key: test_accuracy
value: [0.33333333 0.77777778 0.77777778 0.55555556 0.66666667 0.66666667
0.75 0.875 0.375 0.5 ]
mean value: 0.6277777777777778
key: train_accuracy
value: [0.76623377 0.75324675 0.75324675 0.72727273 0.76623377 0.71428571
0.75641026 0.73076923 0.79487179 0.70512821]
mean value: 0.7467698967698968
key: test_fscore
value: [0.4 0.8 0.8 0.5 0.72727273 0.66666667
0.66666667 0.85714286 0.44444444 0.5 ]
mean value: 0.6362193362193362
key: train_fscore
value: [0.775 0.7654321 0.75949367 0.73417722 0.775 0.71794872
0.75949367 0.74698795 0.80952381 0.72289157]
mean value: 0.7565948701272275
key: test_precision
value: [0.33333333 0.66666667 0.66666667 0.66666667 0.66666667 0.75
1. 1. 0.4 0.5 ]
mean value: 0.665
key: train_precision
value: [0.75609756 0.73809524 0.75 0.70731707 0.73809524 0.7
0.75 0.70454545 0.75555556 0.68181818]
mean value: 0.7281524302256009
key: test_recall
value: [0.5 1. 1. 0.4 0.8 0.6 0.5 0.75 0.5 0.5 ]
mean value: 0.655
key: train_recall
value: [0.79487179 0.79487179 0.76923077 0.76315789 0.81578947 0.73684211
0.76923077 0.79487179 0.87179487 0.76923077]
mean value: 0.7879892037786774
key: test_roc_auc
value: [0.35 0.8 0.8 0.575 0.65 0.675 0.75 0.875 0.375 0.5 ]
mean value: 0.635
key: train_roc_auc
value: [0.76585695 0.75269906 0.75303644 0.72773279 0.7668691 0.7145749
0.75641026 0.73076923 0.79487179 0.70512821]
mean value: 0.7467948717948718
key: test_jcc
value: [0.25 0.66666667 0.66666667 0.33333333 0.57142857 0.5
0.5 0.75 0.28571429 0.33333333]
mean value: 0.4857142857142857
key: train_jcc
value: [0.63265306 0.62 0.6122449 0.58 0.63265306 0.56
0.6122449 0.59615385 0.68 0.56603774]
mean value: 0.609198750037025
MCC on Blind test: 0.08
Accuracy on Blind test: 0.5
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00730515 0.00695133 0.00699258 0.00709629 0.00697374 0.00705719
0.0069747 0.00722718 0.00697184 0.00727534]
mean value: 0.007082533836364746
key: score_time
value: [0.00949669 0.00922227 0.00921607 0.00929904 0.00929213 0.00919104
0.00917578 0.00927401 0.00928402 0.00928831]
mean value: 0.009273934364318847
key: test_mcc
value: [-0.15811388 0.15811388 0.8 0.63245553 0.31622777 0.8
0.5 0.25819889 0.25819889 0.57735027]
mean value: 0.4142431346734462
key: train_mcc
value: [0.58541539 0.66239043 0.61039852 0.61039852 0.55870445 0.61066127
0.64187021 0.64102564 0.62050523 0.56577895]
mean value: 0.6107148606822335
key: test_accuracy
value: [0.44444444 0.55555556 0.88888889 0.77777778 0.66666667 0.88888889
0.75 0.625 0.625 0.75 ]
mean value: 0.6972222222222222
key: train_accuracy
value: [0.79220779 0.83116883 0.80519481 0.80519481 0.77922078 0.80519481
0.82051282 0.82051282 0.80769231 0.78205128]
mean value: 0.804895104895105
key: test_fscore
value: [0.28571429 0.6 0.88888889 0.75 0.72727273 0.88888889
0.75 0.57142857 0.57142857 0.66666667]
mean value: 0.6700288600288601
key: train_fscore
value: [0.78947368 0.83544304 0.81012658 0.8 0.77922078 0.80519481
0.825 0.82051282 0.81927711 0.79012346]
mean value: 0.8074372274615954
key: test_precision
value: [0.33333333 0.5 0.8 1. 0.66666667 1.
0.75 0.66666667 0.66666667 1. ]
mean value: 0.7383333333333333
key: train_precision
value: [0.81081081 0.825 0.8 0.81081081 0.76923077 0.79487179
0.80487805 0.82051282 0.77272727 0.76190476]
mean value: 0.7970747089649529
key: test_recall
value: [0.25 0.75 1. 0.6 0.8 0.8 0.75 0.5 0.5 0.5 ]
mean value: 0.645
key: train_recall
value: [0.76923077 0.84615385 0.82051282 0.78947368 0.78947368 0.81578947
0.84615385 0.82051282 0.87179487 0.82051282]
mean value: 0.8189608636977058
key: test_roc_auc
value: [0.425 0.575 0.9 0.8 0.65 0.9 0.75 0.625 0.625 0.75 ]
mean value: 0.7
key: train_roc_auc
value: [0.79251012 0.83097166 0.80499325 0.80499325 0.77935223 0.80533063
0.82051282 0.82051282 0.80769231 0.78205128]
mean value: 0.8048920377867747
key: test_jcc
value: [0.16666667 0.42857143 0.8 0.6 0.57142857 0.8
0.6 0.4 0.4 0.5 ]
mean value: 0.5266666666666666
key: train_jcc
value: [0.65217391 0.7173913 0.68085106 0.66666667 0.63829787 0.67391304
0.70212766 0.69565217 0.69387755 0.65306122]
mean value: 0.6774012472704161
MCC on Blind test: 0.06
Accuracy on Blind test: 0.65
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00880122 0.00814819 0.00756073 0.00777268 0.00736904 0.00765371
0.00792193 0.00778627 0.00790763 0.00790238]
mean value: 0.007882380485534668
key: score_time
value: [0.0086937 0.00912237 0.00853276 0.00825572 0.00864816 0.00843048
0.0086689 0.00874853 0.00819087 0.00858378]
mean value: 0.008587527275085449
key: test_mcc
value: [0.35 0.1 0.8 1. 0.5976143 0.8
0.77459667 1. 0. 0.77459667]
mean value: 0.6196807643150164
key: train_mcc
value: [0.84516739 0.82485566 0.84852502 0.848923 0.79675455 0.87044534
0.74456944 0.8720816 0.77563153 0.84726867]
mean value: 0.8274222215949533
key: test_accuracy
value: [0.66666667 0.55555556 0.88888889 1. 0.77777778 0.88888889
0.875 1. 0.5 0.875 ]
mean value: 0.8027777777777778
key: train_accuracy
value: [0.92207792 0.90909091 0.92207792 0.92207792 0.8961039 0.93506494
0.87179487 0.93589744 0.88461538 0.92307692]
mean value: 0.9121878121878122
key: test_fscore
value: [0.66666667 0.5 0.88888889 1. 0.83333333 0.88888889
0.85714286 1. 0.5 0.85714286]
mean value: 0.7992063492063491
key: train_fscore
value: [0.925 0.91566265 0.92682927 0.925 0.9 0.93506494
0.875 0.93670886 0.89156627 0.925 ]
mean value: 0.9155831979779763
key: test_precision
value: [0.6 0.5 0.8 1. 0.71428571 1.
1. 1. 0.5 1. ]
mean value: 0.8114285714285714
key: train_precision
value: [0.90243902 0.86363636 0.88372093 0.88095238 0.85714286 0.92307692
0.85365854 0.925 0.84090909 0.90243902]
mean value: 0.8832975131316028
key: test_recall
value: [0.75 0.5 1. 1. 1. 0.8 0.75 1. 0.5 0.75]
mean value: 0.805
key: train_recall
value: [0.94871795 0.97435897 0.97435897 0.97368421 0.94736842 0.94736842
0.8974359 0.94871795 0.94871795 0.94871795]
mean value: 0.950944669365722
key: test_roc_auc
value: [0.675 0.55 0.9 1. 0.75 0.9 0.875 1. 0.5 0.875]
mean value: 0.8025
key: train_roc_auc
value: [0.9217274 0.90823212 0.92139001 0.92273954 0.89676113 0.93522267
0.87179487 0.93589744 0.88461538 0.92307692]
mean value: 0.9121457489878543
key: test_jcc
value: [0.5 0.33333333 0.8 1. 0.71428571 0.8
0.75 1. 0.33333333 0.75 ]
mean value: 0.6980952380952381
key: train_jcc
value: [0.86046512 0.84444444 0.86363636 0.86046512 0.81818182 0.87804878
0.77777778 0.88095238 0.80434783 0.86046512]
mean value: 0.8448784740404756
MCC on Blind test: 0.09
Accuracy on Blind test: 0.52
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.32428074 0.29084134 0.39761615 0.38947153 0.38101339 0.47083735
0.4072361 0.57479668 0.46769285 0.39441442]
mean value: 0.40982005596160886
key: score_time
value: [0.01101065 0.01088691 0.01111293 0.01090026 0.0111537 0.01554298
0.0109508 0.01096511 0.01096678 0.01099515]
mean value: 0.011448526382446289
key: test_mcc
value: [0.1 0.35 0.8 0.8 0.31622777 0.8
0.5 0.77459667 0.25819889 0.77459667]
mean value: 0.5473619994246965
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.55555556 0.66666667 0.88888889 0.88888889 0.66666667 0.88888889
0.75 0.875 0.625 0.875 ]
mean value: 0.7680555555555555
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.5 0.66666667 0.88888889 0.88888889 0.72727273 0.88888889
0.75 0.88888889 0.57142857 0.85714286]
mean value: 0.7628066378066378
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0.6 0.8 1. 0.66666667 1.
0.75 0.8 0.66666667 1. ]
mean value: 0.7783333333333333
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.5 0.75 1. 0.8 0.8 0.8 0.75 1. 0.5 0.75]
mean value: 0.765
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.55 0.675 0.9 0.9 0.65 0.9 0.75 0.875 0.625 0.875]
mean value: 0.77
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.33333333 0.5 0.8 0.8 0.57142857 0.8
0.6 0.8 0.4 0.75 ]
mean value: 0.6354761904761905
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.54
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.00931001 0.00951576 0.00988293 0.00786901 0.00728393 0.01152086
0.00701046 0.00675249 0.00750613 0.01119971]
mean value: 0.008785128593444824
key: score_time
value: [0.01047301 0.01033044 0.00877452 0.00874352 0.00872278 0.01278138
0.00794363 0.00788903 0.00790906 0.0122633 ]
mean value: 0.009583067893981934
key: test_mcc
value: [0.63245553 1. 1. 1. 0.63245553 1.
1. 1. 0.77459667 1. ]
mean value: 0.9039507733308835
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.77777778 1. 1. 1. 0.77777778 1.
1. 1. 0.875 1. ]
mean value: 0.9430555555555555
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 1. 1. 1. 0.75 1.
1. 1. 0.85714286 1. ]
mean value: 0.9407142857142857
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9666666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.6 1. 1. 1. 0.75 1. ]
mean value: 0.935
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8 1. 1. 1. 0.8 1. 1. 1. 0.875 1. ]
mean value: 0.9475
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 1. 1. 1. 0.6 1.
1. 1. 0.75 1. ]
mean value: 0.9016666666666666
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.83
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.07991552 0.07531953 0.07922387 0.07759547 0.07554555 0.07641506
0.08248544 0.07638526 0.07602501 0.07815456]
mean value: 0.07770652770996093
key: score_time
value: [0.01661062 0.01769233 0.01668501 0.01735091 0.01669407 0.01689482
0.01721072 0.01734948 0.0171845 0.01668453]
mean value: 0.017035698890686034
key: test_mcc
value: [0.55 0.35 0.8 0.8 0.79056942 0.8
1. 0.77459667 0.25819889 1. ]
mean value: 0.7123364974030739
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.77777778 0.66666667 0.88888889 0.88888889 0.88888889 0.88888889
1. 0.875 0.625 1. ]
mean value: 0.85
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75 0.66666667 0.88888889 0.88888889 0.90909091 0.88888889
1. 0.88888889 0.57142857 1. ]
mean value: 0.8452741702741703
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.75 0.6 0.8 1. 0.83333333 1.
1. 0.8 0.66666667 1. ]
mean value: 0.845
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.75 1. 0.8 1. 0.8 1. 1. 0.5 1. ]
mean value: 0.86
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.775 0.675 0.9 0.9 0.875 0.9 1. 0.875 0.625 1. ]
mean value: 0.8525
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.6 0.5 0.8 0.8 0.83333333 0.8
1. 0.8 0.4 1. ]
mean value: 0.7533333333333334
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.05
Accuracy on Blind test: 0.63
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00761938 0.00663662 0.00656319 0.00661063 0.00666404 0.00655985
0.006706 0.00655651 0.00683618 0.00663257]
mean value: 0.006738495826721191
key: score_time
value: [0.00826526 0.00768614 0.0077951 0.00779033 0.00775886 0.00782132
0.00778651 0.00774288 0.00777555 0.00771451]
mean value: 0.007813644409179688
key: test_mcc
value: [ 0.35 0.1 -0.15811388 0.1 -0.1 -0.5976143
0.25819889 0. 0. 0.25819889]
mean value: 0.021066959181870643
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.66666667 0.55555556 0.44444444 0.55555556 0.44444444 0.22222222
0.625 0.5 0.5 0.625 ]
mean value: 0.5138888888888888
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.5 0.28571429 0.6 0.44444444 0.
0.57142857 0.5 0.5 0.57142857]
mean value: 0.463968253968254
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.5 0.33333333 0.6 0.5 0.
0.66666667 0.5 0.5 0.66666667]
mean value: 0.48666666666666664
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.5 0.25 0.6 0.4 0. 0.5 0.5 0.5 0.5 ]
mean value: 0.45
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.675 0.55 0.425 0.55 0.45 0.25 0.625 0.5 0.5 0.625]
mean value: 0.515
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.33333333 0.16666667 0.42857143 0.28571429 0.
0.4 0.33333333 0.33333333 0.4 ]
mean value: 0.3180952380952381
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.52
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [0.95179796 0.94567418 0.94950175 0.96822572 0.93672466 0.96356082
0.97707057 1.03818297 1.03070283 1.00553799]
mean value: 0.9766979455947876
key: score_time
value: [0.09188795 0.09431767 0.08775377 0.08800793 0.09016871 0.08714199
0.09610558 0.09580159 0.09604168 0.09132028]
mean value: 0.09185471534729003
key: test_mcc
value: [0.8 0.55 0.8 1. 0.55 1.
1. 0.77459667 0.77459667 1. ]
mean value: 0.8249193338482967
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 0.77777778 0.88888889 1. 0.77777778 1.
1. 0.875 0.875 1. ]
mean value: 0.9083333333333333
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.75 0.88888889 1. 0.8 1.
1. 0.88888889 0.85714286 1. ]
mean value: 0.9073809523809524
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.75 0.8 1. 0.8 1. 1. 0.8 1. 1. ]
mean value: 0.895
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.75 1. 1. 0.8 1. 1. 1. 0.75 1. ]
mean value: 0.93
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.775 0.9 1. 0.775 1. 1. 0.875 0.875 1. ]
mean value: 0.91
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.6 0.8 1. 0.66666667 1.
1. 0.8 0.75 1. ]
mean value: 0.8416666666666667
key: train_jcc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.75
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.92752123 0.89317346 0.80733633 0.90238857 0.82283401 0.9298923
0.91012096 0.82151413 0.85942483 0.84982991]
mean value: 0.8724035739898681
key: score_time
value: [0.19612741 0.17702603 0.17312717 0.23580909 0.18489385 0.20000648
0.20895576 0.13845778 0.27115655 0.17170072]
mean value: 0.19572608470916747
key: test_mcc
value: [0.35 0.55 0.8 1. 0.55 1.
1. 0.5 0.77459667 1. ]
mean value: 0.7524596669241483
key: train_mcc
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.97467943 1. ]
mean value: 0.9974679434480896
key: test_accuracy
value: [0.66666667 0.77777778 0.88888889 1. 0.77777778 1.
1. 0.75 0.875 1. ]
mean value: 0.8736111111111111
key: train_accuracy
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.98717949 1. ]
mean value: 0.9987179487179487
key: test_fscore
value: [0.66666667 0.75 0.88888889 1. 0.8 1.
1. 0.75 0.85714286 1. ]
mean value: 0.8712698412698413
key: train_fscore
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.98701299 1. ]
mean value: 0.9987012987012986
key: test_precision
value: [0.6 0.75 0.8 1. 0.8 1. 1. 0.75 1. 1. ]
mean value: 0.87
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.75 1. 1. 0.8 1. 1. 0.75 0.75 1. ]
mean value: 0.88
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.97435897 1. ]
mean value: 0.9974358974358974
key: test_roc_auc
value: [0.675 0.775 0.9 1. 0.775 1. 1. 0.75 0.875 1. ]
mean value: 0.875
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.98717949 1. ]
mean value: 0.9987179487179487
key: test_jcc
value: [0.5 0.6 0.8 1. 0.66666667 1.
1. 0.6 0.75 1. ]
mean value: 0.7916666666666666
key: train_jcc
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.97435897 1. ]
mean value: 0.9974358974358974
MCC on Blind test: 0.14
Accuracy on Blind test: 0.74
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01836395 0.00678182 0.00684881 0.00723457 0.00742078 0.00704598
0.00751448 0.00682497 0.00756073 0.0067997 ]
mean value: 0.00823957920074463
key: score_time
value: [0.01078892 0.00827122 0.00816345 0.00804162 0.00847411 0.00859022
0.00832844 0.00801682 0.00818849 0.00833344]
mean value: 0.008519673347473144
key: test_mcc
value: [-0.31622777 0.63245553 0.63245553 0.15811388 0.31622777 0.35
0.57735027 0.77459667 -0.25819889 0. ]
mean value: 0.2866772995759719
key: train_mcc
value: [0.53279352 0.50745677 0.5064147 0.45639039 0.53591229 0.42943967
0.51298918 0.46537892 0.59684919 0.41367015]
mean value: 0.49572947743384127
key: test_accuracy
value: [0.33333333 0.77777778 0.77777778 0.55555556 0.66666667 0.66666667
0.75 0.875 0.375 0.5 ]
mean value: 0.6277777777777778
key: train_accuracy
value: [0.76623377 0.75324675 0.75324675 0.72727273 0.76623377 0.71428571
0.75641026 0.73076923 0.79487179 0.70512821]
mean value: 0.7467698967698968
key: test_fscore
value: [0.4 0.8 0.8 0.5 0.72727273 0.66666667
0.66666667 0.85714286 0.44444444 0.5 ]
mean value: 0.6362193362193362
key: train_fscore
value: [0.775 0.7654321 0.75949367 0.73417722 0.775 0.71794872
0.75949367 0.74698795 0.80952381 0.72289157]
mean value: 0.7565948701272275
key: test_precision
value: [0.33333333 0.66666667 0.66666667 0.66666667 0.66666667 0.75
1. 1. 0.4 0.5 ]
mean value: 0.665
key: train_precision
value: [0.75609756 0.73809524 0.75 0.70731707 0.73809524 0.7
0.75 0.70454545 0.75555556 0.68181818]
mean value: 0.7281524302256009
key: test_recall
value: [0.5 1. 1. 0.4 0.8 0.6 0.5 0.75 0.5 0.5 ]
mean value: 0.655
key: train_recall
value: [0.79487179 0.79487179 0.76923077 0.76315789 0.81578947 0.73684211
0.76923077 0.79487179 0.87179487 0.76923077]
mean value: 0.7879892037786774
key: test_roc_auc
value: [0.35 0.8 0.8 0.575 0.65 0.675 0.75 0.875 0.375 0.5 ]
mean value: 0.635
key: train_roc_auc
value: [0.76585695 0.75269906 0.75303644 0.72773279 0.7668691 0.7145749
0.75641026 0.73076923 0.79487179 0.70512821]
mean value: 0.7467948717948718
key: test_jcc
value: [0.25 0.66666667 0.66666667 0.33333333 0.57142857 0.5
0.5 0.75 0.28571429 0.33333333]
mean value: 0.4857142857142857
key: train_jcc
value: [0.63265306 0.62 0.6122449 0.58 0.63265306 0.56
0.6122449 0.59615385 0.68 0.56603774]
mean value: 0.609198750037025
MCC on Blind test: 0.08
Accuracy on Blind test: 0.5
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.0487535 0.02997184 0.05316138 0.02903056 0.02904987 0.0322907
0.18433547 0.02987671 0.02673578 0.02919006]
mean value: 0.049239587783813474
key: score_time
value: [0.01463509 0.00997066 0.00984716 0.00954914 0.00983858 0.01014209
0.01031256 0.01060939 0.01125598 0.00966144]
mean value: 0.010582208633422852
key: test_mcc
value: [0.8 1. 1. 1. 0.8 1.
1. 0.77459667 0.57735027 1. ]
mean value: 0.8951946938431109
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 1. 1. 1. 0.88888889 1.
1. 0.875 0.75 1. ]
mean value: 0.9402777777777778
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 1. 1. 1. 0.88888889 1.
1. 0.88888889 0.66666667 1. ]
mean value: 0.9333333333333333
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 1. 1. 1. 1. 1. 1. 0.8 1. 1. ]
mean value: 0.96
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.8 1. 1. 1. 0.5 1. ]
mean value: 0.93
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 1. 1. 1. 0.9 1. 1. 0.875 0.75 1. ]
mean value: 0.9425
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 1. 1. 1. 0.8 1. 1. 0.8 0.5 1. ]
mean value: 0.89
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.09
Accuracy on Blind test: 0.77
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.00983548 0.01052451 0.0106616 0.01091266 0.010957 0.01091719
0.01095319 0.01184487 0.01091933 0.01090574]
mean value: 0.010843157768249512
key: score_time
value: [0.0105567 0.01009798 0.01041722 0.01042056 0.01047301 0.01049972
0.01047277 0.01058149 0.01053119 0.01043415]
mean value: 0.010448479652404785
key: test_mcc
value: [0.8 0.8 0.8 0.79056942 1. 0.55
1. 0.77459667 0.25819889 0.57735027]
mean value: 0.7350715243220365
key: train_mcc
value: [1. 1. 0.97434188 0.97435897 1. 0.97435897
0.94996791 1. 1. 1. ]
mean value: 0.987302773890115
key: test_accuracy
value: [0.88888889 0.88888889 0.88888889 0.88888889 1. 0.77777778
1. 0.875 0.625 0.75 ]
mean value: 0.8583333333333333
key: train_accuracy
value: [1. 1. 0.98701299 0.98701299 1. 0.98701299
0.97435897 1. 1. 1. ]
mean value: 0.9935397935397935
key: test_fscore
value: [0.88888889 0.88888889 0.88888889 0.90909091 1. 0.8
1. 0.88888889 0.57142857 0.66666667]
mean value: 0.8502741702741703
key: train_fscore
value: [1. 1. 0.98734177 0.98701299 1. 0.98701299
0.975 1. 1. 1. ]
mean value: 0.9936367746177872
key: test_precision
value: [0.8 0.8 0.8 0.83333333 1. 0.8
1. 0.8 0.66666667 1. ]
mean value: 0.85
key: train_precision
value: [1. 1. 0.975 0.97435897 1. 0.97435897
0.95121951 1. 1. 1. ]
mean value: 0.987493746091307
key: test_recall
value: [1. 1. 1. 1. 1. 0.8 1. 1. 0.5 0.5]
mean value: 0.88
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.9 0.9 0.875 1. 0.775 1. 0.875 0.625 0.75 ]
mean value: 0.86
key: train_roc_auc
value: [1. 1. 0.98684211 0.98717949 1. 0.98717949
0.97435897 1. 1. 1. ]
mean value: 0.9935560053981106
key: test_jcc
value: [0.8 0.8 0.8 0.83333333 1. 0.66666667
1. 0.8 0.4 0.5 ]
mean value: 0.76
key: train_jcc
value: [1. 1. 0.975 0.97435897 1. 0.97435897
0.95121951 1. 1. 1. ]
mean value: 0.987493746091307
MCC on Blind test: 0.06
Accuracy on Blind test: 0.67
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.00919414 0.00788784 0.00760937 0.00748897 0.00734115 0.00743103
0.00754428 0.00746417 0.00744605 0.00730228]
mean value: 0.007670927047729492
key: score_time
value: [0.01072121 0.0088973 0.00895858 0.00848174 0.00847363 0.00847983
0.00860381 0.00846124 0.00856113 0.00795674]
mean value: 0.008759522438049316
key: test_mcc
value: [0.55 0.1 0.8 0.8 0.31622777 0.55
0.57735027 0.57735027 0. 0.57735027]
mean value: 0.4848278573585716
key: train_mcc
value: [0.61257733 0.66463964 0.6374073 0.55962522 0.63928106 0.58485583
0.64102564 0.56577895 0.66688593 0.56428809]
mean value: 0.6136364988556005
key: test_accuracy
value: [0.77777778 0.55555556 0.88888889 0.88888889 0.66666667 0.77777778
0.75 0.75 0.5 0.75 ]
mean value: 0.7305555555555555
key: train_accuracy
value: [0.80519481 0.83116883 0.81818182 0.77922078 0.81818182 0.79220779
0.82051282 0.78205128 0.83333333 0.78205128]
mean value: 0.8062104562104563
key: test_fscore
value: [0.75 0.5 0.88888889 0.88888889 0.72727273 0.8
0.66666667 0.66666667 0.5 0.66666667]
mean value: 0.7055050505050505
key: train_fscore
value: [0.8 0.82666667 0.81578947 0.76712329 0.80555556 0.78378378
0.82051282 0.77333333 0.83544304 0.78481013]
mean value: 0.8013018085764565
key: test_precision
value: [0.75 0.5 0.8 1. 0.66666667 0.8
1. 1. 0.5 1. ]
mean value: 0.8016666666666666
key: train_precision
value: [0.83333333 0.86111111 0.83783784 0.8 0.85294118 0.80555556
0.82051282 0.80555556 0.825 0.775 ]
mean value: 0.8216847390376802
key: test_recall
value: [0.75 0.5 1. 0.8 0.8 0.8 0.5 0.5 0.5 0.5 ]
mean value: 0.665
key: train_recall
value: [0.76923077 0.79487179 0.79487179 0.73684211 0.76315789 0.76315789
0.82051282 0.74358974 0.84615385 0.79487179]
mean value: 0.7827260458839406
key: test_roc_auc
value: [0.775 0.55 0.9 0.9 0.65 0.775 0.75 0.75 0.5 0.75 ]
mean value: 0.73
key: train_roc_auc
value: [0.80566802 0.83164642 0.81848853 0.77867746 0.81747638 0.79183536
0.82051282 0.78205128 0.83333333 0.78205128]
mean value: 0.8061740890688258
key: test_jcc
value: [0.6 0.33333333 0.8 0.8 0.57142857 0.66666667
0.5 0.5 0.33333333 0.5 ]
mean value: 0.5604761904761905
key: train_jcc
value: [0.66666667 0.70454545 0.68888889 0.62222222 0.6744186 0.64444444
0.69565217 0.63043478 0.7173913 0.64583333]
mean value: 0.6690497875621738
MCC on Blind test: 0.09
Accuracy on Blind test: 0.56
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00780702 0.00762272 0.00795984 0.00789499 0.00773883 0.00785589
0.00779939 0.00809336 0.00800991 0.00795817]
mean value: 0.007874011993408203
key: score_time
value: [0.00883174 0.00864387 0.00875735 0.00865006 0.00847197 0.00864172
0.00876093 0.00859165 0.00871468 0.00850964]
mean value: 0.008657360076904297
key: test_mcc
value: [0.31622777 0.15811388 0.8 0.79056942 1. 1.
0.77459667 0.5 0.5 1. ]
mean value: 0.6839507733308835
key: train_mcc
value: [1. 0.92480439 0.94935876 1. 0.94804318 0.94935876
0.87904907 0.85634884 0.97467943 0.90219371]
mean value: 0.93838361447064
key: test_accuracy
value: [0.66666667 0.55555556 0.88888889 0.88888889 1. 1.
0.875 0.75 0.75 1. ]
mean value: 0.8375
key: train_accuracy
value: [1. 0.96103896 0.97402597 1. 0.97402597 0.97402597
0.93589744 0.92307692 0.98717949 0.94871795]
mean value: 0.9677988677988678
key: test_fscore
value: [0.57142857 0.6 0.88888889 0.90909091 1. 1.
0.88888889 0.75 0.75 1. ]
mean value: 0.8358297258297258
key: train_fscore
value: [1. 0.96296296 0.97368421 1. 0.97368421 0.97435897
0.93975904 0.92857143 0.98734177 0.95121951]
mean value: 0.9691582107437596
key: test_precision
value: [0.66666667 0.5 0.8 0.83333333 1. 1.
0.8 0.75 0.75 1. ]
mean value: 0.81
key: train_precision
value: [1. 0.92857143 1. 1. 0.97368421 0.95
0.88636364 0.86666667 0.975 0.90697674]
mean value: 0.9487262686314094
key: test_recall
value: [0.5 0.75 1. 1. 1. 1. 1. 0.75 0.75 1. ]
mean value: 0.875
key: train_recall
value: [1. 1. 0.94871795 1. 0.97368421 1.
1. 1. 1. 1. ]
mean value: 0.9922402159244265
key: test_roc_auc
value: [0.65 0.575 0.9 0.875 1. 1. 0.875 0.75 0.75 1. ]
mean value: 0.8375
key: train_roc_auc
value: [1. 0.96052632 0.97435897 1. 0.97402159 0.97435897
0.93589744 0.92307692 0.98717949 0.94871795]
mean value: 0.9678137651821862
key: test_jcc
value: [0.4 0.42857143 0.8 0.83333333 1. 1.
0.8 0.6 0.6 1. ]
mean value: 0.7461904761904762
key: train_jcc
value: [1. 0.92857143 0.94871795 1. 0.94871795 0.95
0.88636364 0.86666667 0.975 0.90697674]
mean value: 0.9411014373223675
MCC on Blind test: 0.09
Accuracy on Blind test: 0.53
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00939584 0.00922012 0.00787425 0.00773025 0.00760698 0.00704384
0.00760889 0.00750351 0.00746274 0.00742221]
mean value: 0.007886862754821778
key: score_time
value: [0.01045752 0.01001 0.00872231 0.0086472 0.00853968 0.00859547
0.0085783 0.00867987 0.00851488 0.008075 ]
mean value: 0.0088820219039917
key: test_mcc
value: [0.5976143 0.55 0.8 0.79056942 0.63245553 1.
0.77459667 0.5 0. 1. ]
mean value: 0.6645235920984451
key: train_mcc
value: [0.90109146 1. 0.97435897 0.90109146 0.70243936 0.75611265
0.94996791 1. 0.46770717 0.9258201 ]
mean value: 0.8578589074717703
key: test_accuracy
value: [0.77777778 0.77777778 0.88888889 0.88888889 0.77777778 1.
0.875 0.75 0.5 1. ]
mean value: 0.8236111111111111
key: train_accuracy
value: [0.94805195 1. 0.98701299 0.94805195 0.83116883 0.87012987
0.97435897 1. 0.67948718 0.96153846]
mean value: 0.91998001998002
key: test_fscore
value: [0.66666667 0.75 0.88888889 0.90909091 0.75 1.
0.88888889 0.75 0.6 1. ]
mean value: 0.8203535353535354
key: train_fscore
value: [0.94594595 1. 0.98701299 0.95 0.79365079 0.85294118
0.975 1. 0.75728155 0.96296296]
mean value: 0.9224795419441336
key: test_precision
value: [1. 0.75 0.8 0.83333333 1. 1.
0.8 0.75 0.5 1. ]
mean value: 0.8433333333333334
key: train_precision
value: [1. 1. 1. 0.9047619 1. 0.96666667
0.95121951 1. 0.609375 0.92857143]
mean value: 0.9360594512195122
key: test_recall
value: [0.5 0.75 1. 1. 0.6 1. 1. 0.75 0.75 1. ]
mean value: 0.835
key: train_recall
value: [0.8974359 1. 0.97435897 1. 0.65789474 0.76315789
1. 1. 1. 1. ]
mean value: 0.9292847503373819
key: test_roc_auc
value: [0.75 0.775 0.9 0.875 0.8 1. 0.875 0.75 0.5 1. ]
mean value: 0.8225
key: train_roc_auc
value: [0.94871795 1. 0.98717949 0.94871795 0.82894737 0.86875843
0.97435897 1. 0.67948718 0.96153846]
mean value: 0.9197705802968961
key: test_jcc
value: [0.5 0.6 0.8 0.83333333 0.6 1.
0.8 0.6 0.42857143 1. ]
mean value: 0.7161904761904762
key: train_jcc
value: [0.8974359 1. 0.97435897 0.9047619 0.65789474 0.74358974
0.95121951 1. 0.609375 0.92857143]
mean value: 0.8667207197755176
MCC on Blind test: 0.11
Accuracy on Blind test: 0.65
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.07140326 0.05798793 0.05990863 0.05765224 0.0598948 0.06153941
0.05800366 0.05805969 0.05758524 0.05807018]
mean value: 0.06001050472259521
key: score_time
value: [0.0154562 0.01460052 0.01546645 0.01415467 0.01506758 0.01572824
0.01406693 0.01429629 0.01435161 0.01549411]
mean value: 0.01486825942993164
key: test_mcc
value: [0.8 1. 1. 1. 0.63245553 1.
1. 1. 0.77459667 1. ]
mean value: 0.9207052201275159
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88888889 1. 1. 1. 0.77777778 1.
1. 1. 0.875 1. ]
mean value: 0.9541666666666666
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 1. 1. 1. 0.75 1.
1. 1. 0.88888889 1. ]
mean value: 0.9527777777777777
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 1. 1. 1. 1. 1. 1. 1. 0.8 1. ]
mean value: 0.96
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.6 1. 1. 1. 1. 1. ]
mean value: 0.96
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 1. 1. 1. 0.8 1. 1. 1. 0.875 1. ]
mean value: 0.9575
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 1. 1. 1. 0.6 1. 1. 1. 0.8 1. ]
mean value: 0.92
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.81
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.02970028 0.02384472 0.02287769 0.03229713 0.02531958 0.02179098
0.02337933 0.03095937 0.02274895 0.02188182]
mean value: 0.025479984283447266
key: score_time
value: [0.0158186 0.0168395 0.01607776 0.02253652 0.02111721 0.01882815
0.02102876 0.02306414 0.01536655 0.0155549 ]
mean value: 0.018623208999633788
key: test_mcc
value: [0.8 1. 1. 1. 0.63245553 1.
1. 0.77459667 0.57735027 1. ]
mean value: 0.8784402470464785
key: train_mcc
value: [1. 1. 0.97435897 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9974358974358974
key: test_accuracy
value: [0.88888889 1. 1. 1. 0.77777778 1.
1. 0.875 0.75 1. ]
mean value: 0.9291666666666667
key: train_accuracy
value: [1. 1. 0.98701299 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9987012987012986
key: test_fscore
value: [0.88888889 1. 1. 1. 0.75 1.
1. 0.88888889 0.66666667 1. ]
mean value: 0.9194444444444444
key: train_fscore
value: [1. 1. 0.98701299 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9987012987012986
key: test_precision
value: [0.8 1. 1. 1. 1. 1. 1. 0.8 1. 1. ]
mean value: 0.96
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.6 1. 1. 1. 0.5 1. ]
mean value: 0.91
key: train_recall
value: [1. 1. 0.97435897 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9974358974358974
key: test_roc_auc
value: [0.9 1. 1. 1. 0.8 1. 1. 0.875 0.75 1. ]
mean value: 0.9325
key: train_roc_auc
value: [1. 1. 0.98717949 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9987179487179487
key: test_jcc
value: [0.8 1. 1. 1. 0.6 1. 1. 0.8 0.5 1. ]
mean value: 0.87
key: train_jcc
value: [1. 1. 0.97435897 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9974358974358974
MCC on Blind test: 0.12
Accuracy on Blind test: 0.84
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.01199746 0.01234484 0.01242924 0.01289916 0.01300168 0.01250005
0.01238441 0.01239324 0.01242089 0.01329017]
mean value: 0.012566113471984863
key: score_time
value: [0.0101943 0.01012278 0.01048732 0.01058817 0.0106461 0.01056409
0.01057553 0.01059437 0.01054907 0.01066661]
mean value: 0.010498833656311036
key: test_mcc
value: [0.55 0.35 0.8 0.79056942 0.79056942 0.8
0.77459667 0.5 0.25819889 0.77459667]
mean value: 0.6388531058314317
key: train_mcc
value: [1. 1. 0.97434188 1. 1. 0.97435897
0.97467943 1. 0.97467943 1. ]
mean value: 0.9898059726472239
key: test_accuracy
value: [0.77777778 0.66666667 0.88888889 0.88888889 0.88888889 0.88888889
0.875 0.75 0.625 0.875 ]
mean value: 0.8125
key: train_accuracy
value: [1. 1. 0.98701299 1. 1. 0.98701299
0.98717949 1. 0.98717949 1. ]
mean value: 0.9948384948384948
key: test_fscore
value: [0.75 0.66666667 0.88888889 0.90909091 0.90909091 0.88888889
0.85714286 0.75 0.57142857 0.85714286]
mean value: 0.8048340548340548
key: train_fscore
value: [1. 1. 0.98734177 1. 1. 0.98701299
0.98734177 1. 0.98701299 1. ]
mean value: 0.9948709518329771
key: test_precision
value: [0.75 0.6 0.8 0.83333333 0.83333333 1.
1. 0.75 0.66666667 1. ]
mean value: 0.8233333333333334
key: train_precision
value: [1. 1. 0.975 1. 1. 0.97435897
0.975 1. 1. 1. ]
mean value: 0.9924358974358974
key: test_recall
value: [0.75 0.75 1. 1. 1. 0.8 0.75 0.75 0.5 0.75]
mean value: 0.805
key: train_recall
value: [1. 1. 1. 1. 1. 1.
1. 1. 0.97435897 1. ]
mean value: 0.9974358974358974
key: test_roc_auc
value: [0.775 0.675 0.9 0.875 0.875 0.9 0.875 0.75 0.625 0.875]
mean value: 0.8125
key: train_roc_auc
value: [1. 1. 0.98684211 1. 1. 0.98717949
0.98717949 1. 0.98717949 1. ]
mean value: 0.9948380566801619
key: test_jcc
value: [0.6 0.5 0.8 0.83333333 0.83333333 0.8
0.75 0.6 0.4 0.75 ]
mean value: 0.6866666666666666
key: train_jcc
value: [1. 1. 0.975 1. 1. 0.97435897
0.975 1. 0.97435897 1. ]
mean value: 0.9898717948717949
MCC on Blind test: 0.1
Accuracy on Blind test: 0.59
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.0684216 0.06198835 0.06150103 0.06125021 0.05035377 0.06086302
0.06419754 0.05684948 0.05581045 0.06373596]
mean value: 0.06049714088439941
key: score_time
value: [0.00865507 0.00866818 0.00822926 0.008461 0.00909662 0.00912499
0.00889039 0.00892878 0.0091598 0.00874305]
mean value: 0.008795714378356934
key: test_mcc
value: [0.63245553 1. 1. 1. 0.63245553 1.
1. 1. 0.77459667 1. ]
mean value: 0.9039507733308835
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.77777778 1. 1. 1. 0.77777778 1.
1. 1. 0.875 1. ]
mean value: 0.9430555555555555
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 1. 1. 1. 0.75 1.
1. 1. 0.85714286 1. ]
mean value: 0.9407142857142857
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.66666667 1. 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9666666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 0.6 1. 1. 1. 0.75 1. ]
mean value: 0.935
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8 1. 1. 1. 0.8 1. 1. 1. 0.875 1. ]
mean value: 0.9475
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 1. 1. 1. 0.6 1.
1. 1. 0.75 1. ]
mean value: 0.9016666666666666
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.76
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00811529 0.00809407 0.01050186 0.00907612 0.00753951 0.0079627
0.00725293 0.00733447 0.00738692 0.00783324]
mean value: 0.008109712600708007
key: score_time
value: [0.01106501 0.01026535 0.00954747 0.00802541 0.0085423 0.00829649
0.00797725 0.00805521 0.00838113 0.00803828]
mean value: 0.008819389343261718
key: test_mcc
value: [ 0.05976143 -0.31622777 0.31622777 0. 0.47809144 -0.05976143
0.25819889 0.57735027 0. 0. ]
mean value: 0.13136406026705444
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.55555556 0.44444444 0.66666667 0.44444444 0.66666667 0.44444444
0.625 0.75 0.5 0.5 ]
mean value: 0.5597222222222222
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.33333333 0. 0.57142857 0. 0.57142857 0.28571429
0.57142857 0.66666667 0.33333333 0. ]
mean value: 0.33333333333333337
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5 0. 0.66666667 0. 1. 0.5
0.66666667 1. 0.5 0. ]
mean value: 0.48333333333333334
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.25 0. 0.5 0. 0.4 0.2 0.5 0.5 0.25 0. ]
mean value: 0.26
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.525 0.4 0.65 0.5 0.7 0.475 0.625 0.75 0.5 0.5 ]
mean value: 0.5625
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.2 0. 0.4 0. 0.4 0.16666667
0.4 0.5 0.2 0. ]
mean value: 0.22666666666666668
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.03
Accuracy on Blind test: 0.51
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01002216 0.0099225 0.00757122 0.00743675 0.00746179 0.00744224
0.00728893 0.00753307 0.00745034 0.00752497]
mean value: 0.007965397834777833
key: score_time
value: [0.01054311 0.00975561 0.008003 0.00801349 0.00793147 0.0078249
0.00790191 0.00796008 0.00788474 0.00783968]
mean value: 0.008365797996520995
key: test_mcc
value: [0.55 0.35 0.8 1. 1. 1.
0.77459667 0.5 0.5 1. ]
mean value: 0.7474596669241483
key: train_mcc
value: [0.97435897 0.94804318 0.89608637 0.92240216 0.94804318 0.92240216
0.89861829 0.94871795 1. 0.97467943]
mean value: 0.9433351706022929
key: test_accuracy
value: [0.77777778 0.66666667 0.88888889 1. 1. 1.
0.875 0.75 0.75 1. ]
mean value: 0.8708333333333333
key: train_accuracy
value: [0.98701299 0.97402597 0.94805195 0.96103896 0.97402597 0.96103896
0.94871795 0.97435897 1. 0.98717949]
mean value: 0.9715451215451215
key: test_fscore
value: [0.75 0.66666667 0.88888889 1. 1. 1.
0.88888889 0.75 0.75 1. ]
mean value: 0.8694444444444445
key: train_fscore
value: [0.98701299 0.97435897 0.94871795 0.96103896 0.97368421 0.96103896
0.95 0.97435897 1. 0.98701299]
mean value: 0.971722400406611
key: test_precision
value: [0.75 0.6 0.8 1. 1. 1. 0.8 0.75 0.75 1. ]
mean value: 0.845
key: train_precision
value: [1. 0.97435897 0.94871795 0.94871795 0.97368421 0.94871795
0.92682927 0.97435897 1. 1. ]
mean value: 0.9695385273690793
key: test_recall
value: [0.75 0.75 1. 1. 1. 1. 1. 0.75 0.75 1. ]
mean value: 0.9
key: train_recall
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:183: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:186: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.97435897 0.97435897 0.94871795 0.97368421 0.97368421 0.97368421
0.97435897 0.97435897 1. 0.97435897]
mean value: 0.9741565452091768
key: test_roc_auc
value: [0.775 0.675 0.9 1. 1. 1. 0.875 0.75 0.75 1. ]
mean value: 0.8725
key: train_roc_auc
value: [0.98717949 0.97402159 0.94804318 0.96120108 0.97402159 0.96120108
0.94871795 0.97435897 1. 0.98717949]
mean value: 0.9715924426450743
key: test_jcc
value: [0.6 0.5 0.8 1. 1. 1. 0.8 0.6 0.6 1. ]
mean value: 0.79
key: train_jcc
value: [0.97435897 0.95 0.90243902 0.925 0.94871795 0.925
0.9047619 0.95 1. 0.97435897]
mean value: 0.9454636826588046
MCC on Blind test: 0.06
Accuracy on Blind test: 0.67
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.07551575 0.0629611 0.06268406 0.06178474 0.06372309 0.06233025
0.06263781 0.06301665 0.06373763 0.06232142]
mean value: 0.06407124996185302
key: score_time
value: [0.00872803 0.00875998 0.00883865 0.0087626 0.00883889 0.00868249
0.00857091 0.00899076 0.00884104 0.00882053]
mean value: 0.008783388137817382
key: test_mcc
value: [0.8 0.35 0.8 1. 1. 0.8
1. 0.77459667 0.5 0.77459667]
mean value: 0.7799193338482967
key: train_mcc
value: [0.94804318 0.94804318 0.94804318 0.92240216 0.94804318 0.94804318
0.94871795 1. 1. 0.94871795]
mean value: 0.9560053981106613
key: test_accuracy
value: [0.88888889 0.66666667 0.88888889 1. 1. 0.88888889
1. 0.875 0.75 0.875 ]
mean value: 0.8833333333333333
key: train_accuracy
value: [0.97402597 0.97402597 0.97402597 0.96103896 0.97402597 0.97402597
0.97435897 1. 1. 0.97435897]
mean value: 0.977988677988678
key: test_fscore
value: [0.88888889 0.66666667 0.88888889 1. 1. 0.88888889
1. 0.88888889 0.75 0.85714286]
mean value: 0.8829365079365079
key: train_fscore
value: [0.97435897 0.97435897 0.97435897 0.96103896 0.97368421 0.97368421
0.97435897 1. 1. 0.97435897]
mean value: 0.9780202253886464
key: test_precision
value: [0.8 0.6 0.8 1. 1. 1. 1. 0.8 0.75 1. ]
mean value: 0.875
key: train_precision
value: [0.97435897 0.97435897 0.97435897 0.94871795 0.97368421 0.97368421
0.97435897 1. 1. 0.97435897]
mean value: 0.9767881241565453
key: test_recall
value: [1. 0.75 1. 1. 1. 0.8 1. 1. 0.75 0.75]
mean value: 0.905
key: train_recall
value: [0.97435897 0.97435897 0.97435897 0.97368421 0.97368421 0.97368421
0.97435897 1. 1. 0.97435897]
mean value: 0.9792847503373819
key: test_roc_auc
value: [0.9 0.675 0.9 1. 1. 0.9 1. 0.875 0.75 0.875]
mean value: 0.8875
key: train_roc_auc
value: [0.97402159 0.97402159 0.97402159 0.96120108 0.97402159 0.97402159
0.97435897 1. 1. 0.97435897]
mean value: 0.9780026990553307
key: test_jcc
value: [0.8 0.5 0.8 1. 1. 0.8 1. 0.8 0.6 0.75]
mean value: 0.805
key: train_jcc
value: [0.95 0.95 0.95 0.925 0.94871795 0.94871795
0.95 1. 1. 0.95 ]
mean value: 0.9572435897435897
MCC on Blind test: 0.06
Accuracy on Blind test: 0.68
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.0179987 0.01539111 0.01446462 0.01311445 0.01305366 0.01297665
0.01468205 0.01300359 0.01401901 0.01403546]
mean value: 0.014273929595947265
key: score_time
value: [0.01068687 0.00844073 0.00901723 0.00842404 0.00852823 0.00848007
0.00915885 0.00892854 0.00850725 0.00879216]
mean value: 0.008896398544311523
key: test_mcc
value: [0.51639778 0.62994079 0.73214286 0.49099025 0.87287156 0.87287156
0.46428571 0.32732684 0.64465837 0.875 ]
mean value: 0.6426485720764821
key: train_mcc
value: [0.808911 0.79446135 0.78111679 0.82629176 0.83951407 0.76668815
0.81031543 0.8251228 0.81092683 0.81027501]
mean value: 0.8073623185403057
key: test_accuracy
value: [0.75 0.8125 0.86666667 0.73333333 0.93333333 0.93333333
0.73333333 0.66666667 0.8 0.93333333]
mean value: 0.81625
key: train_accuracy
value: [0.90441176 0.89705882 0.89051095 0.91240876 0.91970803 0.88321168
0.90510949 0.91240876 0.90510949 0.90510949]
mean value: 0.903504723057106
key: test_fscore
value: [0.77777778 0.8 0.85714286 0.75 0.92307692 0.92307692
0.75 0.70588235 0.84210526 0.93333333]
mean value: 0.8262395430506886
key: train_fscore
value: [0.9037037 0.89552239 0.89051095 0.91044776 0.91970803 0.88571429
0.90510949 0.91044776 0.90225564 0.9037037 ]
mean value: 0.9027123709820484
key: test_precision
value: [0.7 0.85714286 0.85714286 0.66666667 1. 1.
0.75 0.66666667 0.72727273 1. ]
mean value: 0.8224891774891775
key: train_precision
value: [0.91044776 0.90909091 0.89705882 0.93846154 0.92647059 0.87323944
0.89855072 0.92424242 0.92307692 0.91044776]
mean value: 0.911108689028196
key: test_recall
value: [0.875 0.75 0.85714286 0.85714286 0.85714286 0.85714286
0.75 0.75 1. 0.875 ]
mean value: 0.8428571428571429
key: train_recall
value: [0.89705882 0.88235294 0.88405797 0.88405797 0.91304348 0.89855072
0.91176471 0.89705882 0.88235294 0.89705882]
mean value: 0.8947357203751065
key: test_roc_auc
value: [0.75 0.8125 0.86607143 0.74107143 0.92857143 0.92857143
0.73214286 0.66071429 0.78571429 0.9375 ]
mean value: 0.8142857142857143
key: train_roc_auc
value: [0.90441176 0.89705882 0.8905584 0.91261722 0.91975703 0.88309889
0.90515772 0.91229753 0.90494459 0.90505115]
mean value: 0.9034953111679455
key: test_jcc
value: [0.63636364 0.66666667 0.75 0.6 0.85714286 0.85714286
0.6 0.54545455 0.72727273 0.875 ]
mean value: 0.711504329004329
key: train_jcc
value: [0.82432432 0.81081081 0.80263158 0.83561644 0.85135135 0.79487179
0.82666667 0.83561644 0.82191781 0.82432432]
mean value: 0.8228131536228147
MCC on Blind test: 0.12
Accuracy on Blind test: 0.66
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.36799169 0.37216139 0.3838408 0.37895703 0.3886342 0.3847723
0.39137197 0.39352298 0.38147902 0.38831353]
mean value: 0.3831044912338257
key: score_time
value: [0.00858855 0.00922418 0.00917625 0.00932384 0.00937819 0.00943565
0.00947142 0.00946307 0.00953293 0.00940537]
mean value: 0.009299945831298829
key: test_mcc
value: [0.62994079 0.8819171 0.875 0.49099025 1. 0.73214286
0.6000992 0.87287156 0.64465837 0.875 ]
mean value: 0.7602620132524002
key: train_mcc
value: [0.85294118 1. 1. 0.88360693 1. 1.
1. 1. 0.88355744 1. ]
mean value: 0.9620105545903546
key: test_accuracy
value: [0.8125 0.9375 0.93333333 0.73333333 1. 0.86666667
0.8 0.93333333 0.8 0.93333333]
mean value: 0.875
key: train_accuracy
value: [0.92647059 1. 1. 0.94160584 1. 1.
1. 1. 0.94160584 1. ]
mean value: 0.9809682267067411
key: test_fscore
value: [0.82352941 0.94117647 0.93333333 0.75 1. 0.85714286
0.82352941 0.94117647 0.84210526 0.93333333]
mean value: 0.8845326551673302
key: train_fscore
value: [0.92647059 1. 1. 0.94117647 1. 1.
1. 1. 0.94029851 1. ]
mean value: 0.9807945566286216
key: test_precision
value: [0.77777778 0.88888889 0.875 0.66666667 1. 0.85714286
0.77777778 0.88888889 0.72727273 1. ]
mean value: 0.8459415584415584
key: train_precision
value: [0.92647059 1. 1. 0.95522388 1. 1.
1. 1. 0.95454545 1. ]
mean value: 0.9836239923377763
key: test_recall
value: [0.875 1. 1. 0.85714286 1. 0.85714286
0.875 1. 1. 0.875 ]
mean value: 0.9339285714285714
key: train_recall
value: [0.92647059 1. 1. 0.92753623 1. 1.
1. 1. 0.92647059 1. ]
mean value: 0.9780477408354646
key: test_roc_auc
value: [0.8125 0.9375 0.9375 0.74107143 1. 0.86607143
0.79464286 0.92857143 0.78571429 0.9375 ]
mean value: 0.8741071428571429
key: train_roc_auc
value: [0.92647059 1. 1. 0.94170929 1. 1.
1. 1. 0.94149616 1. ]
mean value: 0.9809676044330776
key: test_jcc
value: [0.7 0.88888889 0.875 0.6 1. 0.75
0.7 0.88888889 0.72727273 0.875 ]
mean value: 0.8005050505050505
key: train_jcc
value: [0.8630137 1. 1. 0.88888889 1. 1.
1. 1. 0.88732394 1. ]
mean value: 0.9639226531180998
MCC on Blind test: 0.0
Accuracy on Blind test: 0.68
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.00949454 0.00907326 0.00779319 0.00751305 0.00748301 0.0074904
0.00753403 0.00772738 0.00752568 0.00749397]
mean value: 0.007912850379943848
key: score_time
value: [0.01055598 0.01037431 0.00883818 0.00855374 0.00868988 0.00856733
0.00868702 0.00864244 0.00872922 0.00876641]
mean value: 0.009040451049804688
key: test_mcc
value: [0.37796447 0.25 0.60714286 0.26189246 0.46428571 0.56407607
0.19642857 0.41931393 0.21821789 0.34247476]
mean value: 0.3701796738627931
key: train_mcc
value: [0.59233863 0.52313884 0.49254979 0.53036644 0.56781069 0.53654458
0.71021843 0.58848522 0.56432157 0.58903512]
mean value: 0.5694809310571065
key: test_accuracy
value: [0.625 0.625 0.8 0.6 0.73333333 0.73333333
0.6 0.66666667 0.6 0.66666667]
mean value: 0.665
key: train_accuracy
value: [0.78676471 0.75 0.72992701 0.75912409 0.76642336 0.75182482
0.84671533 0.7810219 0.77372263 0.77372263]
mean value: 0.7719246457707171
key: test_fscore
value: [0.72727273 0.625 0.8 0.66666667 0.71428571 0.77777778
0.625 0.76190476 0.57142857 0.73684211]
mean value: 0.7006178324599377
key: train_fscore
value: [0.81045752 0.78205128 0.77300613 0.78431373 0.80246914 0.79012346
0.82644628 0.80769231 0.7394958 0.80745342]
mean value: 0.7923509054595705
key: test_precision
value: [0.57142857 0.625 0.75 0.54545455 0.71428571 0.63636364
0.625 0.61538462 0.66666667 0.63636364]
mean value: 0.6385947385947386
key: train_precision
value: [0.72941176 0.69318182 0.67021277 0.71428571 0.69892473 0.68817204
0.94339623 0.71590909 0.8627451 0.69892473]
mean value: 0.7415163983870607
key: test_recall
value: [1. 0.625 0.85714286 0.85714286 0.71428571 1.
0.625 1. 0.5 0.875 ]
mean value: 0.8053571428571429
key: train_recall
value: [0.91176471 0.89705882 0.91304348 0.86956522 0.94202899 0.92753623
0.73529412 0.92647059 0.64705882 0.95588235]
mean value: 0.8725703324808184
key: test_roc_auc
value: [0.625 0.625 0.80357143 0.61607143 0.73214286 0.75
0.59821429 0.64285714 0.60714286 0.65178571]
mean value: 0.6651785714285714
key: train_roc_auc
value: [0.78676471 0.75 0.72858056 0.75831202 0.76513214 0.75053282
0.84590793 0.78207587 0.77280477 0.77504263]
mean value: 0.7715153452685422
key: test_jcc
value: [0.57142857 0.45454545 0.66666667 0.5 0.55555556 0.63636364
0.45454545 0.61538462 0.4 0.58333333]
mean value: 0.5437823287823288
key: train_jcc
value: [0.68131868 0.64210526 0.63 0.64516129 0.67010309 0.65306122
0.70422535 0.67741935 0.58666667 0.67708333]
mean value: 0.6567144259023844
MCC on Blind test: 0.02
Accuracy on Blind test: 0.47
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.00795698 0.00770211 0.00773144 0.00778985 0.00778174 0.00784087
0.00769806 0.0066731 0.00672388 0.0066855 ]
mean value: 0.007458353042602539
key: score_time
value: [0.00865579 0.00877738 0.00871015 0.00862479 0.00876021 0.00872946
0.00876045 0.00781512 0.00778174 0.00782132]
mean value: 0.008443641662597656
key: test_mcc
value: [ 0.25 -0.25 0.73214286 0.09449112 0.75592895 0.49099025
0.33928571 -0.13363062 0.33928571 0.19642857]
mean value: 0.2814922553488389
key: train_mcc
value: [0.50195781 0.54894692 0.44946013 0.47724794 0.37278745 0.44522592
0.41602728 0.48933032 0.41632915 0.44553401]
mean value: 0.4562846929723249
key: test_accuracy
value: [0.625 0.375 0.86666667 0.53333333 0.86666667 0.73333333
0.66666667 0.46666667 0.66666667 0.6 ]
mean value: 0.64
key: train_accuracy
value: [0.75 0.77205882 0.72262774 0.73722628 0.68613139 0.72262774
0.7080292 0.74452555 0.7080292 0.72262774]
mean value: 0.727388364104766
key: test_fscore
value: [0.625 0.375 0.85714286 0.58823529 0.83333333 0.75
0.66666667 0.6 0.66666667 0.625 ]
mean value: 0.658704481792717
key: train_fscore
value: [0.76056338 0.7862069 0.74324324 0.75342466 0.68148148 0.72463768
0.70588235 0.73684211 0.71014493 0.72463768]
mean value: 0.7327064407151792
key: test_precision
value: [0.625 0.375 0.85714286 0.5 1. 0.66666667
0.71428571 0.5 0.71428571 0.625 ]
mean value: 0.6577380952380952
key: train_precision
value: [0.72972973 0.74025974 0.69620253 0.71428571 0.6969697 0.72463768
0.70588235 0.75384615 0.7 0.71428571]
mean value: 0.7176099315122916
key: test_recall
value: [0.625 0.375 0.85714286 0.71428571 0.71428571 0.85714286
0.625 0.75 0.625 0.625 ]
mean value: 0.6767857142857143
key: train_recall
value: [0.79411765 0.83823529 0.79710145 0.79710145 0.66666667 0.72463768
0.70588235 0.72058824 0.72058824 0.73529412]
mean value: 0.7500213128729752
key: test_roc_auc
value: [0.625 0.375 0.86607143 0.54464286 0.85714286 0.74107143
0.66964286 0.44642857 0.66964286 0.59821429]
mean value: 0.6392857142857143
key: train_roc_auc
value: [0.75 0.77205882 0.72208014 0.73678602 0.68627451 0.72261296
0.70801364 0.74435209 0.7081202 0.72271952]
mean value: 0.7273017902813299
key: test_jcc
value: [0.45454545 0.23076923 0.75 0.41666667 0.71428571 0.6
0.5 0.42857143 0.5 0.45454545]
mean value: 0.504938394938395
key: train_jcc
value: [0.61363636 0.64772727 0.59139785 0.6043956 0.51685393 0.56818182
0.54545455 0.58333333 0.5505618 0.56818182]
mean value: 0.57897243357102
MCC on Blind test: 0.1
Accuracy on Blind test: 0.58
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.00700665 0.00714684 0.00702357 0.007092 0.00712657 0.0071609
0.00703955 0.00731921 0.00701737 0.00706315]
mean value: 0.007099580764770508
key: score_time
value: [0.00979042 0.00942588 0.00939441 0.00933671 0.00949192 0.00942111
0.009372 0.00936747 0.00985074 0.00934005]
mean value: 0.009479069709777832
key: test_mcc
value: [ 0.51639778 0.25819889 0.73214286 0.21821789 0.75592895 0.32732684
-0.02620712 0.32732684 0.73214286 0.60714286]
mean value: 0.44486186267144306
key: train_mcc
value: [0.72254413 0.69486799 0.68583647 0.72439971 0.62437433 0.68322489
0.68163703 0.68163703 0.68011153 0.65087548]
mean value: 0.6829508591825769
key: test_accuracy
value: [0.75 0.625 0.86666667 0.6 0.86666667 0.66666667
0.46666667 0.66666667 0.86666667 0.8 ]
mean value: 0.7175
key: train_accuracy
value: [0.86029412 0.84558824 0.83941606 0.86131387 0.81021898 0.83941606
0.83941606 0.83941606 0.83941606 0.82481752]
mean value: 0.8399313009875483
key: test_fscore
value: [0.77777778 0.66666667 0.85714286 0.625 0.83333333 0.61538462
0.2 0.70588235 0.875 0.8 ]
mean value: 0.6956187603246426
key: train_fscore
value: [0.86524823 0.85314685 0.85135135 0.86713287 0.82191781 0.84931507
0.84507042 0.84507042 0.84285714 0.82857143]
mean value: 0.8469681591792749
key: test_precision
value: [0.7 0.6 0.85714286 0.55555556 1. 0.66666667
0.5 0.66666667 0.875 0.85714286]
mean value: 0.7278174603174603
key: train_precision
value: [0.83561644 0.81333333 0.79746835 0.83783784 0.77922078 0.80519481
0.81081081 0.81081081 0.81944444 0.80555556]
mean value: 0.8115293169994922
key: test_recall
value: [0.875 0.75 0.85714286 0.71428571 0.71428571 0.57142857
0.125 0.75 0.875 0.75 ]
mean value: 0.6982142857142857
key: train_recall
value: [0.89705882 0.89705882 0.91304348 0.89855072 0.86956522 0.89855072
0.88235294 0.88235294 0.86764706 0.85294118]
mean value: 0.8859121909633418
key: test_roc_auc
value: [0.75 0.625 0.86607143 0.60714286 0.85714286 0.66071429
0.49107143 0.66071429 0.86607143 0.80357143]
mean value: 0.71875
key: train_roc_auc
value: [0.86029412 0.84558824 0.83887468 0.86104007 0.80978261 0.83898124
0.8397272 0.8397272 0.83962063 0.82502131]
mean value: 0.8398657289002557
key: test_jcc
value: [0.63636364 0.5 0.75 0.45454545 0.71428571 0.44444444
0.11111111 0.54545455 0.77777778 0.66666667]
mean value: 0.560064935064935
key: train_jcc
value: [0.7625 0.74390244 0.74117647 0.7654321 0.69767442 0.73809524
0.73170732 0.73170732 0.72839506 0.70731707]
mean value: 0.7347907434123415
MCC on Blind test: 0.06
Accuracy on Blind test: 0.68
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.00901961 0.00860167 0.0086236 0.00857639 0.00859451 0.00781941
0.00762987 0.00763583 0.00852704 0.00775576]
mean value: 0.008278369903564453
key: score_time
value: [0.00887251 0.00862575 0.00865078 0.00868511 0.00861216 0.00795341
0.00790024 0.00792432 0.00796032 0.00794959]
mean value: 0.008313417434692383
key: test_mcc
value: [0.62994079 0.62994079 0.73214286 0.56407607 0.87287156 0.60714286
0.33928571 0.18898224 0.75592895 0.875 ]
mean value: 0.6195311823553656
key: train_mcc
value: [0.77949606 0.85331034 0.85540562 0.86948194 0.82629176 0.86939892
0.8978896 0.83947987 0.85400682 0.86868474]
mean value: 0.8513445663864698
key: test_accuracy
value: [0.8125 0.8125 0.86666667 0.73333333 0.93333333 0.8
0.66666667 0.6 0.86666667 0.93333333]
mean value: 0.8025
key: train_accuracy
value: [0.88970588 0.92647059 0.9270073 0.93430657 0.91240876 0.93430657
0.94890511 0.91970803 0.9270073 0.93430657]
mean value: 0.9254132674967797
key: test_fscore
value: [0.82352941 0.8 0.85714286 0.77777778 0.92307692 0.8
0.66666667 0.66666667 0.88888889 0.93333333]
mean value: 0.8137082525317819
key: train_fscore
value: [0.88888889 0.92753623 0.92957746 0.93333333 0.91044776 0.93617021
0.94814815 0.91851852 0.92647059 0.93333333]
mean value: 0.9252424481090294
key: test_precision
value: [0.77777778 0.85714286 0.85714286 0.63636364 1. 0.75
0.71428571 0.6 0.8 1. ]
mean value: 0.7992712842712842
key: train_precision
value: [0.89552239 0.91428571 0.90410959 0.95454545 0.93846154 0.91666667
0.95522388 0.92537313 0.92647059 0.94029851]
mean value: 0.9270957461683526
key: test_recall
value: [0.875 0.75 0.85714286 1. 0.85714286 0.85714286
0.625 0.75 1. 0.875 ]
mean value: 0.8446428571428571
key: train_recall
value: [0.88235294 0.94117647 0.95652174 0.91304348 0.88405797 0.95652174
0.94117647 0.91176471 0.92647059 0.92647059]
mean value: 0.9239556692242115
key: test_roc_auc
value: [0.8125 0.8125 0.86607143 0.75 0.92857143 0.80357143
0.66964286 0.58928571 0.85714286 0.9375 ]
mean value: 0.8026785714285715
key: train_roc_auc
value: [0.88970588 0.92647059 0.92679028 0.93446292 0.91261722 0.93414322
0.9488491 0.91965047 0.92700341 0.93424979]
mean value: 0.9253942881500427
key: test_jcc
value: [0.7 0.66666667 0.75 0.63636364 0.85714286 0.66666667
0.5 0.5 0.8 0.875 ]
mean value: 0.6951839826839826
key: train_jcc
value: [0.8 0.86486486 0.86842105 0.875 0.83561644 0.88
0.90140845 0.84931507 0.8630137 0.875 ]
mean value: 0.8612639573680121
MCC on Blind test: 0.13
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.47060013 0.6176157 0.50893569 0.47440553 0.48640704 0.68746996
0.48285437 0.49517989 0.48710799 0.6235292 ]
mean value: 0.5334105491638184
key: score_time
value: [0.01105475 0.01343441 0.01317406 0.01111579 0.01340437 0.01400685
0.01163292 0.01111388 0.01380134 0.01445436]
mean value: 0.012719273567199707
key: test_mcc
value: [0.77459667 0.75 0.87287156 0.49099025 1. 0.73214286
0.47245559 0.32732684 0.75592895 0.73214286]
mean value: 0.6908455570136127
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.875 0.93333333 0.73333333 1. 0.86666667
0.73333333 0.66666667 0.86666667 0.86666667]
mean value: 0.8416666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.875 0.92307692 0.75 1. 0.85714286
0.77777778 0.70588235 0.88888889 0.875 ]
mean value: 0.8541657688716512
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.8 0.875 1. 0.66666667 1. 0.85714286
0.7 0.66666667 0.8 0.875 ]
mean value: 0.824047619047619
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 0.85714286 0.85714286 1. 0.85714286
0.875 0.75 1. 0.875 ]
mean value: 0.8946428571428571
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.875 0.92857143 0.74107143 1. 0.86607143
0.72321429 0.66071429 0.85714286 0.86607143]
mean value: 0.8392857142857143
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.77777778 0.85714286 0.6 1. 0.75
0.63636364 0.54545455 0.8 0.77777778]
mean value: 0.7544516594516595
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.06
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.01054811 0.01029015 0.00749731 0.00756335 0.00791764 0.00744152
0.0080564 0.0079093 0.00725198 0.00789857]
mean value: 0.008237433433532716
key: score_time
value: [0.01101589 0.00920248 0.00816894 0.00859499 0.00827861 0.00804591
0.00809813 0.00824928 0.00804496 0.00812697]
mean value: 0.008582615852355957
key: test_mcc
value: [1. 0.77459667 0.875 0.76376262 1. 0.87287156
1. 1. 0.87287156 0.875 ]
mean value: 0.9034102406955395
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.875 0.93333333 0.86666667 1. 0.93333333
1. 1. 0.93333333 0.93333333]
mean value: 0.9475
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.88888889 0.93333333 0.875 1. 0.92307692
1. 1. 0.94117647 0.93333333]
mean value: 0.9494808949220714
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.8 0.875 0.77777778 1. 1.
1. 1. 0.88888889 1. ]
mean value: 0.9341666666666667
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9732142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.875 0.9375 0.875 1. 0.92857143
1. 1. 0.92857143 0.9375 ]
mean value: 0.9482142857142857
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.8 0.875 0.77777778 1. 0.85714286
1. 1. 0.88888889 0.875 ]
mean value: 0.9073809523809524
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.13
Accuracy on Blind test: 0.85
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.08002043 0.08039927 0.08053231 0.08475113 0.0849824 0.07976866
0.07944989 0.08087707 0.07947731 0.0836072 ]
mean value: 0.08138656616210938
key: score_time
value: [0.01744008 0.01705742 0.01676226 0.01815081 0.01665783 0.01678061
0.01786375 0.0182426 0.01667714 0.01768732]
mean value: 0.017331981658935548
key: test_mcc
value: [0.8819171 0.75 0.87287156 0.66143783 1. 0.87287156
0.46428571 0.76376262 0.875 0.76376262]
mean value: 0.7905908999279945
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.875 0.93333333 0.8 1. 0.93333333
0.73333333 0.86666667 0.93333333 0.86666667]
mean value: 0.8879166666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.875 0.92307692 0.82352941 1. 0.92307692
0.75 0.85714286 0.93333333 0.85714286]
mean value: 0.8883478776125835
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.875 1. 0.7 1. 1.
0.75 1. 1. 1. ]
mean value: 0.9213888888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 0.85714286 1. 1. 0.85714286
0.75 0.75 0.875 0.75 ]
mean value: 0.8714285714285714
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.875 0.92857143 0.8125 1. 0.92857143
0.73214286 0.875 0.9375 0.875 ]
mean value: 0.8901785714285715
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.77777778 0.85714286 0.7 1. 0.85714286
0.6 0.75 0.875 0.75 ]
mean value: 0.805595238095238
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.1
Accuracy on Blind test: 0.81
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.00745249 0.0100553 0.00690699 0.00686359 0.00682211 0.0068984
0.00738096 0.00688457 0.00711179 0.00706244]
mean value: 0.007343864440917969
key: score_time
value: [0.00842643 0.00823832 0.00787807 0.00817585 0.00788713 0.00785685
0.00784945 0.00778127 0.00796604 0.00789261]
mean value: 0.007995200157165528
key: test_mcc
value: [1. 0.40451992 0.60714286 0.875 0.76376262 0.33928571
0.76376262 0.46428571 0.75592895 0.875 ]
mean value: 0.6848688380862632
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.6875 0.8 0.93333333 0.86666667 0.66666667
0.86666667 0.73333333 0.86666667 0.93333333]
mean value: 0.8354166666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.73684211 0.8 0.93333333 0.875 0.66666667
0.85714286 0.75 0.88888889 0.93333333]
mean value: 0.8441207184628238
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.63636364 0.75 0.875 0.77777778 0.625
1. 0.75 0.8 1. ]
mean value: 0.8214141414141414
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.875 0.85714286 1. 1. 0.71428571
0.75 0.75 1. 0.875 ]
mean value: 0.8821428571428571
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.6875 0.80357143 0.9375 0.875 0.66964286
0.875 0.73214286 0.85714286 0.9375 ]
mean value: 0.8375
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.58333333 0.66666667 0.875 0.77777778 0.5
0.75 0.6 0.8 0.875 ]
mean value: 0.7427777777777778
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.73
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [0.99027419 1.03364635 0.99537587 0.99380183 1.01279187 1.00695038
1.00400448 0.99069548 0.98471999 0.9896822 ]
mean value: 1.000194263458252
key: score_time
value: [0.09284711 0.09792686 0.09596872 0.09674335 0.097049 0.09700847
0.09636211 0.08898997 0.08923626 0.15491176]
mean value: 0.10070436000823975
key: test_mcc
value: [0.8819171 0.8819171 0.875 0.76376262 1. 0.87287156
0.60714286 0.87287156 1. 0.73214286]
mean value: 0.848762565937602
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.9375 0.93333333 0.86666667 1. 0.93333333
0.8 0.93333333 1. 0.86666667]
mean value: 0.9208333333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.94117647 0.93333333 0.875 1. 0.92307692
0.8 0.94117647 1. 0.875 ]
mean value: 0.9229939668174962
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.88888889 0.875 0.77777778 1. 1.
0.85714286 0.88888889 1. 0.875 ]
mean value: 0.9051587301587302
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
0.75 1. 1. 0.875 ]
mean value: 0.9482142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.9375 0.9375 0.875 1. 0.92857143
0.80357143 0.92857143 1. 0.86607143]
mean value: 0.9214285714285715
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.88888889 0.875 0.77777778 1. 0.85714286
0.66666667 0.88888889 1. 0.77777778]
mean value: 0.8621031746031745
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.83
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.79997134 0.8617053 0.82350397 0.86626768 0.86152625 0.8775835
0.89794159 0.84732342 0.82997847 0.88673472]
mean value: 0.8552536249160767
key: score_time
value: [0.23055267 0.18991017 0.19632292 0.2545855 0.13287044 0.18487072
0.21556759 0.20604992 0.17664123 0.12801123]
mean value: 0.19153823852539062
key: test_mcc
value: [0.8819171 0.75 0.875 0.76376262 0.87287156 0.73214286
0.60714286 0.73214286 1. 0.73214286]
mean value: 0.7947122709029568
key: train_mcc
value: [0.97100831 0.94117647 0.95710706 0.98550418 0.95630861 0.97080136
0.98550418 0.98550725 0.97122151 0.98550725]
mean value: 0.9709646177394017
key: test_accuracy
value: [0.9375 0.875 0.93333333 0.86666667 0.93333333 0.86666667
0.8 0.86666667 1. 0.86666667]
mean value: 0.8945833333333334
key: train_accuracy
value: [0.98529412 0.97058824 0.97810219 0.99270073 0.97810219 0.98540146
0.99270073 0.99270073 0.98540146 0.99270073]
mean value: 0.9853692571919279
key: test_fscore
value: [0.94117647 0.875 0.93333333 0.875 0.92307692 0.85714286
0.8 0.875 1. 0.875 ]
mean value: 0.8954729584141349
key: train_fscore
value: [0.98550725 0.97058824 0.9787234 0.99280576 0.97810219 0.98550725
0.99259259 0.99270073 0.98550725 0.99270073]
mean value: 0.9854735376303184
key: test_precision
value: [0.88888889 0.875 0.875 0.77777778 1. 0.85714286
0.85714286 0.875 1. 0.875 ]
mean value: 0.888095238095238
key: train_precision
value: [0.97142857 0.97058824 0.95833333 0.98571429 0.98529412 0.98550725
1. 0.98550725 0.97142857 0.98550725]
mean value: 0.9799308853976374
key: test_recall
value: [1. 0.875 1. 1. 0.85714286 0.85714286
0.75 0.875 1. 0.875 ]
mean value: 0.9089285714285714
key: train_recall
value: [1. 0.97058824 1. 1. 0.97101449 0.98550725
0.98529412 1. 1. 1. ]
mean value: 0.9912404092071612
key: test_roc_auc
value: [0.9375 0.875 0.9375 0.875 0.92857143 0.86607143
0.80357143 0.86607143 1. 0.86607143]
mean value: 0.8955357142857143
key: train_roc_auc
value: [0.98529412 0.97058824 0.97794118 0.99264706 0.97815431 0.98540068
0.99264706 0.99275362 0.98550725 0.99275362]
mean value: 0.9853687127024723
key: test_jcc
value: [0.88888889 0.77777778 0.875 0.77777778 0.85714286 0.75
0.66666667 0.77777778 1. 0.77777778]
mean value: 0.8148809523809524
key: train_jcc
value: [0.97142857 0.94285714 0.95833333 0.98571429 0.95714286 0.97142857
0.98529412 0.98550725 0.97142857 0.98550725]
mean value: 0.9714641943734016
MCC on Blind test: 0.1
Accuracy on Blind test: 0.79
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01698375 0.00691271 0.00714707 0.00702286 0.00691867 0.00681043
0.00706601 0.0070343 0.00683665 0.0071075 ]
mean value: 0.007983994483947755
key: score_time
value: [0.01220894 0.00788188 0.00856304 0.00795507 0.0079546 0.00788164
0.00790548 0.00793004 0.00796223 0.00810909]
mean value: 0.008435201644897462
key: test_mcc
value: [ 0.25 -0.25 0.73214286 0.09449112 0.75592895 0.49099025
0.33928571 -0.13363062 0.33928571 0.19642857]
mean value: 0.2814922553488389
key: train_mcc
value: [0.50195781 0.54894692 0.44946013 0.47724794 0.37278745 0.44522592
0.41602728 0.48933032 0.41632915 0.44553401]
mean value: 0.4562846929723249
key: test_accuracy
value: [0.625 0.375 0.86666667 0.53333333 0.86666667 0.73333333
0.66666667 0.46666667 0.66666667 0.6 ]
mean value: 0.64
key: train_accuracy
value: [0.75 0.77205882 0.72262774 0.73722628 0.68613139 0.72262774
0.7080292 0.74452555 0.7080292 0.72262774]
mean value: 0.727388364104766
key: test_fscore
value: [0.625 0.375 0.85714286 0.58823529 0.83333333 0.75
0.66666667 0.6 0.66666667 0.625 ]
mean value: 0.658704481792717
key: train_fscore
value: [0.76056338 0.7862069 0.74324324 0.75342466 0.68148148 0.72463768
0.70588235 0.73684211 0.71014493 0.72463768]
mean value: 0.7327064407151792
key: test_precision
value: [0.625 0.375 0.85714286 0.5 1. 0.66666667
0.71428571 0.5 0.71428571 0.625 ]
mean value: 0.6577380952380952
key: train_precision
value: [0.72972973 0.74025974 0.69620253 0.71428571 0.6969697 0.72463768
0.70588235 0.75384615 0.7 0.71428571]
mean value: 0.7176099315122916
key: test_recall
value: [0.625 0.375 0.85714286 0.71428571 0.71428571 0.85714286
0.625 0.75 0.625 0.625 ]
mean value: 0.6767857142857143
key: train_recall
value: [0.79411765 0.83823529 0.79710145 0.79710145 0.66666667 0.72463768
0.70588235 0.72058824 0.72058824 0.73529412]
mean value: 0.7500213128729752
key: test_roc_auc
value: [0.625 0.375 0.86607143 0.54464286 0.85714286 0.74107143
0.66964286 0.44642857 0.66964286 0.59821429]
mean value: 0.6392857142857143
key: train_roc_auc
value: [0.75 0.77205882 0.72208014 0.73678602 0.68627451 0.72261296
0.70801364 0.74435209 0.7081202 0.72271952]
mean value: 0.7273017902813299
key: test_jcc
value: [0.45454545 0.23076923 0.75 0.41666667 0.71428571 0.6
0.5 0.42857143 0.5 0.45454545]
mean value: 0.504938394938395
key: train_jcc
value: [0.61363636 0.64772727 0.59139785 0.6043956 0.51685393 0.56818182
0.54545455 0.58333333 0.5505618 0.56818182]
mean value: 0.57897243357102
MCC on Blind test: 0.1
Accuracy on Blind test: 0.58
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.06262994 0.03505611 0.03692508 0.03597021 0.06148291 0.03540158
0.03483367 0.03505754 0.04629922 0.03492475]
mean value: 0.04185810089111328
key: score_time
value: [0.01055789 0.01049376 0.01050019 0.01044226 0.01041293 0.01036716
0.01037478 0.0117774 0.01043272 0.01040506]
mean value: 0.010576415061950683
key: test_mcc
value: [1. 0.8819171 0.875 0.76376262 1. 1.
0.87287156 1. 1. 0.875 ]
mean value: 0.9268551280458139
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9375 0.93333333 0.86666667 1. 1.
0.93333333 1. 1. 0.93333333]
mean value: 0.9604166666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94117647 0.93333333 0.875 1. 1.
0.94117647 1. 1. 0.93333333]
mean value: 0.9624019607843137
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 0.875 0.77777778 1. 1.
0.88888889 1. 1. 1. ]
mean value: 0.9430555555555555
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.875]
mean value: 0.9875
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9375 0.9375 0.875 1. 1.
0.92857143 1. 1. 0.9375 ]
mean value: 0.9616071428571429
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.88888889 0.875 0.77777778 1. 1.
0.88888889 1. 1. 0.875 ]
mean value: 0.9305555555555556
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.12
Accuracy on Blind test: 0.84
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.01320124 0.01201296 0.01213074 0.01227403 0.01183081 0.01187682
0.01212811 0.01183581 0.01189661 0.03893161]
mean value: 0.014811873435974121
key: score_time
value: [0.0113101 0.01076269 0.01052332 0.01057029 0.0105257 0.01047421
0.0105381 0.01045918 0.01049614 0.01063371]
mean value: 0.01062934398651123
key: test_mcc
value: [0.77459667 0.77459667 0.73214286 0.66143783 0.87287156 0.87287156
0.75592895 0.47245559 0.64465837 1. ]
mean value: 0.7561560053780203
key: train_mcc
value: [0.92898531 0.92737353 0.91392776 0.97120941 0.91277477 0.94318882
0.88668406 0.94323594 0.91597649 0.92791659]
mean value: 0.927127267186985
key: test_accuracy
value: [0.875 0.875 0.86666667 0.8 0.93333333 0.93333333
0.86666667 0.73333333 0.8 1. ]
mean value: 0.8683333333333334
key: train_accuracy
value: [0.96323529 0.96323529 0.95620438 0.98540146 0.95620438 0.97080292
0.94160584 0.97080292 0.95620438 0.96350365]
mean value: 0.9627200515242593
key: test_fscore
value: [0.88888889 0.88888889 0.85714286 0.82352941 0.92307692 0.92307692
0.88888889 0.77777778 0.84210526 1. ]
mean value: 0.8813375822663748
key: train_fscore
value: [0.96453901 0.96402878 0.95774648 0.98571429 0.95714286 0.97183099
0.94366197 0.97142857 0.95774648 0.96402878]
mean value: 0.9637868190827705
key: test_precision
value: [0.8 0.8 0.85714286 0.7 1. 1.
0.8 0.7 0.72727273 1. ]
mean value: 0.8384415584415584
key: train_precision
value: [0.93150685 0.94366197 0.93150685 0.97183099 0.94366197 0.94520548
0.90540541 0.94444444 0.91891892 0.94366197]
mean value: 0.9379804848259411
key: test_recall
value: [1. 1. 0.85714286 1. 0.85714286 0.85714286
1. 0.875 1. 1. ]
mean value: 0.9446428571428571
key: train_recall
value: [1. 0.98529412 0.98550725 1. 0.97101449 1.
0.98529412 1. 1. 0.98529412]
mean value: 0.9912404092071612
key: test_roc_auc
value: [0.875 0.875 0.86607143 0.8125 0.92857143 0.92857143
0.85714286 0.72321429 0.78571429 1. ]
mean value: 0.8651785714285715
key: train_roc_auc
value: [0.96323529 0.96323529 0.95598892 0.98529412 0.95609548 0.97058824
0.94192242 0.97101449 0.95652174 0.96366155]
mean value: 0.9627557544757033
key: test_jcc
value: [0.8 0.8 0.75 0.7 0.85714286 0.85714286
0.8 0.63636364 0.72727273 1. ]
mean value: 0.7927922077922078
key: train_jcc
value: [0.93150685 0.93055556 0.91891892 0.97183099 0.91780822 0.94520548
0.89333333 0.94444444 0.91891892 0.93055556]
mean value: 0.9303078260587425
MCC on Blind test: 0.05
Accuracy on Blind test: 0.64
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.00955915 0.0070889 0.00685692 0.00679111 0.00704551 0.0069344
0.00679803 0.0070312 0.00693274 0.00679612]
mean value: 0.0071834087371826175
key: score_time
value: [0.01106119 0.00804234 0.00776124 0.00791359 0.00784922 0.00808263
0.0079062 0.00786209 0.00788569 0.00790429]
mean value: 0.008226847648620606
key: test_mcc
value: [ 0.12598816 0.25819889 0.73214286 0.33928571 0.87287156 0.37796447
0.19642857 -0.13363062 0.46428571 0.6000992 ]
mean value: 0.3833634515705724
key: train_mcc
value: [0.48661135 0.51745489 0.47592003 0.50667322 0.41725962 0.50373224
0.50394373 0.5339313 0.53314859 0.47473887]
mean value: 0.4953413853595016
key: test_accuracy
value: [0.5625 0.625 0.86666667 0.66666667 0.93333333 0.66666667
0.6 0.46666667 0.73333333 0.8 ]
mean value: 0.6920833333333334
key: train_accuracy
value: [0.74264706 0.75735294 0.73722628 0.75182482 0.7080292 0.75182482
0.75182482 0.76642336 0.76642336 0.73722628]
mean value: 0.747080291970803
key: test_fscore
value: [0.58823529 0.57142857 0.85714286 0.66666667 0.92307692 0.70588235
0.625 0.6 0.75 0.82352941]
mean value: 0.7110962077138547
key: train_fscore
value: [0.75177305 0.76923077 0.75 0.76712329 0.72222222 0.75714286
0.75362319 0.77142857 0.76811594 0.73913043]
mean value: 0.7549790322558434
key: test_precision
value: [0.55555556 0.66666667 0.85714286 0.625 1. 0.6
0.625 0.5 0.75 0.77777778]
mean value: 0.6957142857142857
key: train_precision
value: [0.7260274 0.73333333 0.72 0.72727273 0.69333333 0.74647887
0.74285714 0.75 0.75714286 0.72857143]
mean value: 0.7325017093010533
key: test_recall
value: [0.625 0.5 0.85714286 0.71428571 0.85714286 0.85714286
0.625 0.75 0.75 0.875 ]
mean value: 0.7410714285714286
key: train_recall
value: [0.77941176 0.80882353 0.7826087 0.8115942 0.75362319 0.76811594
0.76470588 0.79411765 0.77941176 0.75 ]
mean value: 0.7792412617220801
key: test_roc_auc
value: [0.5625 0.625 0.86607143 0.66964286 0.92857143 0.67857143
0.59821429 0.44642857 0.73214286 0.79464286]
mean value: 0.6901785714285714
key: train_roc_auc
value: [0.74264706 0.75735294 0.73689258 0.75138534 0.70769395 0.75170503
0.75191816 0.76662404 0.76651748 0.73731884]
mean value: 0.7470055413469735
key: test_jcc
value: [0.41666667 0.4 0.75 0.5 0.85714286 0.54545455
0.45454545 0.42857143 0.6 0.7 ]
mean value: 0.5652380952380952
key: train_jcc
value: [0.60227273 0.625 0.6 0.62222222 0.56521739 0.6091954
0.60465116 0.62790698 0.62352941 0.5862069 ]
mean value: 0.6066202190949461
MCC on Blind test: 0.1
Accuracy on Blind test: 0.6
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.00778508 0.00735497 0.00741458 0.00750518 0.0074389 0.00734568
0.00739765 0.00764251 0.00755811 0.00754356]
mean value: 0.007498621940612793
key: score_time
value: [0.00792003 0.00796342 0.00831413 0.00790501 0.0078187 0.00797248
0.00821495 0.00799203 0.0079875 0.00809884]
mean value: 0.008018708229064942
key: test_mcc
value: [0.62994079 0.62994079 0.875 0.19642857 0.87287156 0.87287156
0.32732684 0.75592895 0.64465837 0.875 ]
mean value: 0.6679967422606682
key: train_mcc
value: [0.88580789 0.91334626 0.89863497 0.83795818 0.91240409 0.83063246
0.92787101 0.91281179 0.92710997 0.92709446]
mean value: 0.8973671087701672
key: test_accuracy
value: [0.8125 0.8125 0.93333333 0.6 0.93333333 0.93333333
0.66666667 0.86666667 0.8 0.93333333]
mean value: 0.8291666666666667
key: train_accuracy
value: [0.94117647 0.95588235 0.94890511 0.91240876 0.95620438 0.91240876
0.96350365 0.95620438 0.96350365 0.96350365]
mean value: 0.9473701159295835
key: test_fscore
value: [0.82352941 0.82352941 0.93333333 0.57142857 0.92307692 0.92307692
0.70588235 0.88888889 0.84210526 0.93333333]
mean value: 0.8368184412766456
key: train_fscore
value: [0.93846154 0.95714286 0.95035461 0.9047619 0.95652174 0.90769231
0.96240602 0.95652174 0.96350365 0.96296296]
mean value: 0.9460329323884149
key: test_precision
value: [0.77777778 0.77777778 0.875 0.57142857 1. 1.
0.66666667 0.8 0.72727273 1. ]
mean value: 0.819592352092352
key: train_precision
value: [0.98387097 0.93055556 0.93055556 1. 0.95652174 0.96721311
0.98461538 0.94285714 0.95652174 0.97014925]
mean value: 0.9622860453071885
key: test_recall
value: [0.875 0.875 1. 0.57142857 0.85714286 0.85714286
0.75 1. 1. 0.875 ]
mean value: 0.8660714285714286
key: train_recall
value: [0.89705882 0.98529412 0.97101449 0.82608696 0.95652174 0.85507246
0.94117647 0.97058824 0.97058824 0.95588235]
mean value: 0.9329283887468031
key: test_roc_auc
value: [0.8125 0.8125 0.9375 0.59821429 0.92857143 0.92857143
0.66071429 0.85714286 0.78571429 0.9375 ]
mean value: 0.8258928571428572
key: train_roc_auc
value: [0.94117647 0.95588235 0.94874254 0.91304348 0.95620205 0.91283035
0.96334186 0.95630861 0.96355499 0.96344842]
mean value: 0.9474531116794545
key: test_jcc
value: [0.7 0.7 0.875 0.4 0.85714286 0.85714286
0.54545455 0.8 0.72727273 0.875 ]
mean value: 0.7337012987012987
key: train_jcc
value: [0.88405797 0.91780822 0.90540541 0.82608696 0.91666667 0.83098592
0.92753623 0.91666667 0.92957746 0.92857143]
mean value: 0.8983362926190229
MCC on Blind test: 0.05
Accuracy on Blind test: 0.63
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01016855 0.0098474 0.0079248 0.00727248 0.00720954 0.00730157
0.00728512 0.0073278 0.00719166 0.00727534]
mean value: 0.007880425453186036
key: score_time
value: [0.010952 0.00936007 0.00861073 0.00789762 0.00792098 0.00781727
0.00834227 0.00790691 0.00788283 0.00789428]
mean value: 0.008458495140075684
key: test_mcc
value: [0.57735027 0.8819171 0.875 0.33928571 0.87287156 0.87287156
0.33928571 0.37796447 0.46428571 0.875 ]
mean value: 0.6475832110632131
key: train_mcc
value: [0.63408348 0.8979331 0.77817796 0.83063246 0.92951942 0.81712461
0.85977656 0.72794365 0.85721269 0.88920184]
mean value: 0.822160576316637
key: test_accuracy
value: [0.75 0.9375 0.93333333 0.66666667 0.93333333 0.93333333
0.66666667 0.66666667 0.73333333 0.93333333]
mean value: 0.8154166666666667
key: train_accuracy
value: [0.78676471 0.94852941 0.88321168 0.91240876 0.96350365 0.90510949
0.9270073 0.84671533 0.9270073 0.94160584]
mean value: 0.9041863460712752
key: test_fscore
value: [0.8 0.93333333 0.93333333 0.66666667 0.92307692 0.92307692
0.66666667 0.61538462 0.75 0.93333333]
mean value: 0.8144871794871795
key: train_fscore
value: [0.82424242 0.94736842 0.89333333 0.90769231 0.96240602 0.91156463
0.921875 0.8173913 0.92307692 0.9375 ]
mean value: 0.904645035463338
key: test_precision
value: [0.66666667 1. 0.875 0.625 1. 1.
0.71428571 0.8 0.75 1. ]
mean value: 0.8430952380952381
key: train_precision
value: [0.70103093 0.96923077 0.82716049 0.96721311 1. 0.85897436
0.98333333 1. 0.96774194 1. ]
mean value: 0.9274684933438643
key: test_recall
value: [1. 0.875 1. 0.71428571 0.85714286 0.85714286
0.625 0.5 0.75 0.875 ]
mean value: 0.8053571428571429
key: train_recall
value: [1. 0.92647059 0.97101449 0.85507246 0.92753623 0.97101449
0.86764706 0.69117647 0.88235294 0.88235294]
mean value: 0.8974637681159421
key: test_roc_auc
value: [0.75 0.9375 0.9375 0.66964286 0.92857143 0.92857143
0.66964286 0.67857143 0.73214286 0.9375 ]
mean value: 0.8169642857142857
key: train_roc_auc
value: [0.78676471 0.94852941 0.88256607 0.91283035 0.96376812 0.90462489
0.92657715 0.84558824 0.92668372 0.94117647]
mean value: 0.9039109121909633
key: test_jcc
value: [0.66666667 0.875 0.875 0.5 0.85714286 0.85714286
0.5 0.44444444 0.6 0.875 ]
mean value: 0.7050396825396825
key: train_jcc
value: [0.70103093 0.9 0.80722892 0.83098592 0.92753623 0.8375
0.85507246 0.69117647 0.85714286 0.88235294]
mean value: 0.8290026723550397
MCC on Blind test: 0.06
Accuracy on Blind test: 0.89
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.07523441 0.06551671 0.06416392 0.06420016 0.06557775 0.06523657
0.06472826 0.06670904 0.06583929 0.06667423]
mean value: 0.06638803482055664
key: score_time
value: [0.01517701 0.01486087 0.01571703 0.01545548 0.01541901 0.01526618
0.01506066 0.01570487 0.01489067 0.01541162]
mean value: 0.015296339988708496
key: test_mcc
value: [0.8819171 0.8819171 0.875 0.66143783 1. 0.87287156
1. 0.87287156 0.87287156 0.875 ]
mean value: 0.879388671797445
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.9375 0.93333333 0.8 1. 0.93333333
1. 0.93333333 0.93333333 0.93333333]
mean value: 0.9341666666666667
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94117647 0.94117647 0.93333333 0.82352941 1. 0.92307692
1. 0.94117647 0.94117647 0.93333333]
mean value: 0.9377978883861237
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88888889 0.88888889 0.875 0.7 1. 1.
1. 0.88888889 0.88888889 1. ]
mean value: 0.9130555555555555
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9732142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.9375 0.9375 0.8125 1. 0.92857143
1. 0.92857143 0.92857143 0.9375 ]
mean value: 0.9348214285714286
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88888889 0.88888889 0.875 0.7 1. 0.85714286
1. 0.88888889 0.88888889 0.875 ]
mean value: 0.8862698412698412
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.76
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.03459382 0.04311633 0.02597976 0.02609849 0.03556275 0.03000331
0.02980828 0.03237772 0.04908109 0.03363228]
mean value: 0.034025382995605466
key: score_time
value: [0.03137994 0.01657486 0.01867056 0.01809192 0.03612328 0.02216148
0.02189708 0.01990652 0.03687644 0.01487947]
mean value: 0.023656153678894044
key: test_mcc
value: [1. 0.8819171 0.875 0.76376262 0.87287156 0.87287156
1. 1. 1. 0.875 ]
mean value: 0.9141422841402109
key: train_mcc
value: [1. 1. 0.98550418 1. 0.98550725 1.
1. 0.98550418 1. 0.98550725]
mean value: 0.9942022851330479
key: test_accuracy
value: [1. 0.9375 0.93333333 0.86666667 0.93333333 0.93333333
1. 1. 1. 0.93333333]
mean value: 0.95375
key: train_accuracy
value: [1. 1. 0.99270073 1. 0.99270073 1.
1. 0.99270073 1. 0.99270073]
mean value: 0.997080291970803
key: test_fscore
value: [1. 0.94117647 0.93333333 0.875 0.92307692 0.92307692
1. 1. 1. 0.93333333]
mean value: 0.9528996983408748
key: train_fscore
value: [1. 1. 0.99280576 1. 0.99270073 1.
1. 0.99259259 1. 0.99270073]
mean value: 0.9970799807842291
key: test_precision
value: [1. 0.88888889 0.875 0.77777778 1. 1.
1. 1. 1. 1. ]
mean value: 0.9541666666666666
key: train_precision
value: [1. 1. 0.98571429 1. 1. 1.
1. 1. 1. 0.98550725]
mean value: 0.9971221532091097
key: test_recall
value: [1. 1. 1. 1. 0.85714286 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9589285714285715
key: train_recall
value: [1. 1. 1. 1. 0.98550725 1.
1. 0.98529412 1. 1. ]
mean value: 0.997080136402387
key: test_roc_auc
value: [1. 0.9375 0.9375 0.875 0.92857143 0.92857143
1. 1. 1. 0.9375 ]
mean value: 0.9544642857142858
key: train_roc_auc
value: [1. 1. 0.99264706 1. 0.99275362 1.
1. 0.99264706 1. 0.99275362]
mean value: 0.997080136402387
key: test_jcc
value: [1. 0.88888889 0.875 0.77777778 0.85714286 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9130952380952381
key: train_jcc
value: [1. 1. 0.98571429 1. 0.98550725 1.
1. 0.98529412 1. 0.98550725]
mean value: 0.9942022896114968
MCC on Blind test: 0.12
Accuracy on Blind test: 0.84
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.03075981 0.03832865 0.06963396 0.06718922 0.03995085 0.03923106
0.03885245 0.061131 0.04299402 0.03564787]
mean value: 0.04637188911437988
key: score_time
value: [0.02218819 0.01115394 0.01116896 0.03056479 0.02163672 0.02096963
0.02151918 0.03110862 0.01723385 0.01872468]
mean value: 0.02062685489654541
key: test_mcc
value: [0.67419986 0.75 0.87287156 0.37796447 1. 0.73214286
0.46428571 0.46428571 1. 0.76376262]
mean value: 0.7099512797956697
key: train_mcc
value: [0.95598573 0.98540068 0.97080136 0.95630861 0.97080136 0.95630861
0.97080136 0.97080136 0.97080136 0.97080136]
mean value: 0.9678811811884551
key: test_accuracy
value: [0.8125 0.875 0.93333333 0.66666667 1. 0.86666667
0.73333333 0.73333333 1. 0.86666667]
mean value: 0.84875
key: train_accuracy
value: [0.97794118 0.99264706 0.98540146 0.97810219 0.98540146 0.97810219
0.98540146 0.98540146 0.98540146 0.98540146]
mean value: 0.9839201373980249
key: test_fscore
value: [0.84210526 0.875 0.92307692 0.70588235 1. 0.85714286
0.75 0.75 1. 0.85714286]
mean value: 0.8560350253461708
key: train_fscore
value: [0.97777778 0.99259259 0.98550725 0.97810219 0.98550725 0.97810219
0.98529412 0.98529412 0.98529412 0.98529412]
mean value: 0.9838765713274273
key: test_precision
value: [0.72727273 0.875 1. 0.6 1. 0.85714286
0.75 0.75 1. 1. ]
mean value: 0.8559415584415584
key: train_precision
value: [0.98507463 1. 0.98550725 0.98529412 0.98550725 0.98529412
0.98529412 0.98529412 0.98529412 0.98529412]
mean value: 0.9867853825501648
key: test_recall
value: [1. 0.875 0.85714286 0.85714286 1. 0.85714286
0.75 0.75 1. 0.75 ]
mean value: 0.8696428571428572
key: train_recall
value: [0.97058824 0.98529412 0.98550725 0.97101449 0.98550725 0.97101449
0.98529412 0.98529412 0.98529412 0.98529412]
mean value: 0.9810102301790282
key: test_roc_auc
value: [0.8125 0.875 0.92857143 0.67857143 1. 0.86607143
0.73214286 0.73214286 1. 0.875 ]
mean value: 0.85
key: train_roc_auc
value: [0.97794118 0.99264706 0.98540068 0.97815431 0.98540068 0.97815431
0.98540068 0.98540068 0.98540068 0.98540068]
mean value: 0.9839300937766412
key: test_jcc
value: [0.72727273 0.77777778 0.85714286 0.54545455 1. 0.75
0.6 0.6 1. 0.75 ]
mean value: 0.7607647907647908
key: train_jcc
value: [0.95652174 0.98529412 0.97142857 0.95714286 0.97142857 0.95714286
0.97101449 0.97101449 0.97101449 0.97101449]
mean value: 0.9683016684934843
MCC on Blind test: 0.06
Accuracy on Blind test: 0.66
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.09868574 0.10024285 0.09119558 0.08083391 0.09357262 0.08993793
0.10161471 0.10149956 0.09327483 0.08393335]
mean value: 0.0934791088104248
key: score_time
value: [0.00927162 0.00913954 0.00923514 0.00928712 0.00947499 0.00919628
0.00924039 0.00923944 0.00928307 0.00930262]
mean value: 0.009267020225524902
key: test_mcc
value: [1. 0.8819171 0.875 0.76376262 1. 0.87287156
1. 1. 0.87287156 0.73214286]
mean value: 0.8998565698544966
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [1. 0.9375 0.93333333 0.86666667 1. 0.93333333
1. 1. 0.93333333 0.86666667]
mean value: 0.9470833333333334
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [1. 0.94117647 0.93333333 0.875 1. 0.92307692
1. 1. 0.94117647 0.875 ]
mean value: 0.9488763197586727
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [1. 0.88888889 0.875 0.77777778 1. 1.
1. 1. 0.88888889 0.875 ]
mean value: 0.9305555555555556
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 1. 1. 1. 0.85714286
1. 1. 1. 0.875 ]
mean value: 0.9732142857142857
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [1. 0.9375 0.9375 0.875 1. 0.92857143
1. 1. 0.92857143 0.86607143]
mean value: 0.9473214285714285
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [1. 0.88888889 0.875 0.77777778 1. 0.85714286
1. 1. 0.88888889 0.77777778]
mean value: 0.906547619047619
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.83
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.00916886 0.01091504 0.01081634 0.01080394 0.0134356 0.02716422
0.01087403 0.01102185 0.01133323 0.01125884]
mean value: 0.012679195404052735
key: score_time
value: [0.01023698 0.01037884 0.01042628 0.01103234 0.01079631 0.01123476
0.01329875 0.01280212 0.01062632 0.01068163]
mean value: 0.011151432991027832
key: test_mcc
value: [0.8819171 0.67419986 0.75592895 0.75592895 0.75592895 0.53452248
0.37796447 0.76376262 0.76376262 0.76376262]
mean value: 0.7027678608518798
key: train_mcc
value: [1. 0.90184995 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9901849950564579
key: test_accuracy
value: [0.9375 0.8125 0.86666667 0.86666667 0.86666667 0.73333333
0.66666667 0.86666667 0.86666667 0.86666667]
mean value: 0.835
key: train_accuracy
value: [1. 0.94852941 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9948529411764706
key: test_fscore
value: [0.94117647 0.76923077 0.83333333 0.83333333 0.83333333 0.6
0.61538462 0.85714286 0.85714286 0.85714286]
mean value: 0.7997220426632191
key: train_fscore
value: [1. 0.94573643 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9945736434108527
key: test_precision
value: [0.88888889 1. 1. 1. 1. 1.
0.8 1. 1. 1. ]
mean value: 0.9688888888888889
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.625 0.71428571 0.71428571 0.71428571 0.42857143
0.5 0.75 0.75 0.75 ]
mean value: 0.6946428571428571
key: train_recall
value: [1. 0.89705882 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9897058823529412
key: test_roc_auc
value: [0.9375 0.8125 0.85714286 0.85714286 0.85714286 0.71428571
0.67857143 0.875 0.875 0.875 ]
mean value: 0.8339285714285715
key: train_roc_auc
value: [1. 0.94852941 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9948529411764706
key: test_jcc
value: [0.88888889 0.625 0.71428571 0.71428571 0.71428571 0.42857143
0.44444444 0.75 0.75 0.75 ]
mean value: 0.6779761904761905
key: train_jcc
value: [1. 0.89705882 1. 1. 1. 1.
1. 1. 1. 1. ]
mean value: 0.9897058823529412
MCC on Blind test: -0.02
Accuracy on Blind test: 0.95
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01154399 0.01006269 0.00780892 0.00763559 0.00742674 0.00739622
0.00755072 0.00743032 0.00746632 0.00746202]
mean value: 0.008178353309631348
key: score_time
value: [0.01060176 0.00935245 0.00819874 0.00818491 0.00788951 0.00785613
0.00786829 0.00791764 0.00788474 0.0078702 ]
mean value: 0.008362436294555664
key: test_mcc
value: [0.75 0.62994079 0.73214286 0.49099025 0.87287156 0.87287156
0.64465837 0.6000992 0.64465837 0.875 ]
mean value: 0.7113232961000079
key: train_mcc
value: [0.83832595 0.86849267 0.85434012 0.91240409 0.86868474 0.8978896
0.88360693 0.82480818 0.86948194 0.8555278 ]
mean value: 0.8673562022561286
key: test_accuracy
value: [0.875 0.8125 0.86666667 0.73333333 0.93333333 0.93333333
0.8 0.8 0.8 0.93333333]
mean value: 0.84875
key: train_accuracy
value: [0.91911765 0.93382353 0.9270073 0.95620438 0.93430657 0.94890511
0.94160584 0.91240876 0.93430657 0.9270073 ]
mean value: 0.9334693001288106
key: test_fscore
value: [0.875 0.82352941 0.85714286 0.75 0.92307692 0.92307692
0.84210526 0.82352941 0.84210526 0.93333333]
mean value: 0.8592899386475238
key: train_fscore
value: [0.91970803 0.9352518 0.92857143 0.95652174 0.9352518 0.94964029
0.94202899 0.91176471 0.9352518 0.92857143]
mean value: 0.9342562000313209
key: test_precision
value: [0.875 0.77777778 0.85714286 0.66666667 1. 1.
0.72727273 0.77777778 0.72727273 1. ]
mean value: 0.8408910533910534
key: train_precision
value: [0.91304348 0.91549296 0.91549296 0.95652174 0.92857143 0.94285714
0.92857143 0.91176471 0.91549296 0.90277778]
mean value: 0.9230586574290872
key: test_recall
value: [0.875 0.875 0.85714286 0.85714286 0.85714286 0.85714286
1. 0.875 1. 0.875 ]
mean value: 0.8928571428571428
key: train_recall
value: [0.92647059 0.95588235 0.94202899 0.95652174 0.94202899 0.95652174
0.95588235 0.91176471 0.95588235 0.95588235]
mean value: 0.9458866155157716
key: test_roc_auc
value: [0.875 0.8125 0.86607143 0.74107143 0.92857143 0.92857143
0.78571429 0.79464286 0.78571429 0.9375 ]
mean value: 0.8455357142857143
key: train_roc_auc
value: [0.91911765 0.93382353 0.92689685 0.95620205 0.93424979 0.9488491
0.94170929 0.91240409 0.93446292 0.92721654]
mean value: 0.933493179880648
key: test_jcc
value: [0.77777778 0.7 0.75 0.6 0.85714286 0.85714286
0.72727273 0.7 0.72727273 0.875 ]
mean value: 0.7571608946608946
key: train_jcc
value: [0.85135135 0.87837838 0.86666667 0.91666667 0.87837838 0.90410959
0.89041096 0.83783784 0.87837838 0.86666667]
mean value: 0.876884487226953
MCC on Blind test: 0.07
Accuracy on Blind test: 0.7
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr',
'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr...
'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf',
'logorI', 'lineage_proportion', 'dist_lineage_proportion',
'lineage_count_all', 'lineage_count_unique'],
dtype='object')),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.07331181 0.06256533 0.06086087 0.06129169 0.06339002 0.06102061
0.06144905 0.06290483 0.06220293 0.06425667]
mean value: 0.0633253812789917
key: score_time
value: [0.00836086 0.00896025 0.00843644 0.00837779 0.00837159 0.00880218
0.00834155 0.00841522 0.00857377 0.00888371]
mean value: 0.008552336692810058
key: test_mcc
value: [0.75 0.62994079 0.73214286 0.66143783 0.87287156 0.87287156
0.64465837 0.6000992 0.64465837 0.875 ]
mean value: 0.7283680535735243
key: train_mcc
value: [0.83832595 0.87000211 0.88466669 0.91240409 0.86868474 0.89863497
0.90025835 0.88476385 0.9139999 0.84173622]
mean value: 0.8813476865607188
key: test_accuracy
value: [0.875 0.8125 0.86666667 0.8 0.93333333 0.93333333
0.8 0.8 0.8 0.93333333]
mean value: 0.8554166666666667
key: train_accuracy
value: [0.91911765 0.93382353 0.94160584 0.95620438 0.93430657 0.94890511
0.94890511 0.94160584 0.95620438 0.91970803]
mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:203: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:206: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
0.940038643194504
key: test_fscore
value: [0.875 0.82352941 0.85714286 0.82352941 0.92307692 0.92307692
0.84210526 0.82352941 0.84210526 0.93333333]
mean value: 0.8666428798239943
key: train_fscore
value: [0.91970803 0.93617021 0.94366197 0.95652174 0.9352518 0.95035461
0.95035461 0.94285714 0.95714286 0.92198582]
mean value: 0.9414008786946603
key: test_precision
value: [0.875 0.77777778 0.85714286 0.7 1. 1.
0.72727273 0.77777778 0.72727273 1. ]
mean value: 0.8442243867243867
key: train_precision
value: [0.91304348 0.90410959 0.91780822 0.95652174 0.92857143 0.93055556
0.91780822 0.91666667 0.93055556 0.89041096]
mean value: 0.920605141004188
key: test_recall
value: [0.875 0.875 0.85714286 1. 0.85714286 0.85714286
1. 0.875 1. 0.875 ]
mean value: 0.9071428571428571
key: train_recall
value: [0.92647059 0.97058824 0.97101449 0.95652174 0.94202899 0.97101449
0.98529412 0.97058824 0.98529412 0.95588235]
mean value: 0.9634697357203751
key: test_roc_auc
value: [0.875 0.8125 0.86607143 0.8125 0.92857143 0.92857143
0.78571429 0.79464286 0.78571429 0.9375 ]
mean value: 0.8526785714285714
key: train_roc_auc
value: [0.91911765 0.93382353 0.9413896 0.95620205 0.93424979 0.94874254
0.9491688 0.94181586 0.95641517 0.91997016]
mean value: 0.9400895140664962
key: test_jcc
value: [0.77777778 0.7 0.75 0.7 0.85714286 0.85714286
0.72727273 0.7 0.72727273 0.875 ]
mean value: 0.7671608946608947
key: train_jcc
value: [0.85135135 0.88 0.89333333 0.91666667 0.87837838 0.90540541
0.90540541 0.89189189 0.91780822 0.85526316]
mean value: 0.8895503809505252
MCC on Blind test: 0.06
Accuracy on Blind test: 0.66