LSHTM_analysis/scripts/ml/log_katg_cd_sl.txt
2022-06-20 21:55:47 +01:00

19928 lines
991 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_sl.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 817
PASS: my_features_df and aa_df successfully combined
nrows: 817
ncols: 269
count of NULL values before imputation
or_mychisq 244
log10_or_mychisq 244
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 168
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 175
-------------------------------------------------------------
Successfully split data with stratification according to scaling law [COMPLETE data]: 1/sqrt(x_ncols)
Input features data size: (817, 175)
Train data size: (755, 175)
Test data size: (62, 175)
y_train numbers: Counter({0: 437, 1: 318})
y_train ratio: 1.3742138364779874
y_test_numbers: Counter({0: 36, 1: 26})
y_test ratio: 1.3846153846153846
-------------------------------------------------------------
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
Original Data
Counter({0: 437, 1: 318}) Data dim: (755, 175)
Simple Random OverSampling
Counter({0: 437, 1: 437})
(874, 175)
Simple Random UnderSampling
Counter({0: 318, 1: 318})
(636, 175)
Simple Combined Over and UnderSampling
Counter({0: 437, 1: 437})
(874, 175)
SMOTE_NC OverSampling
Counter({0: 437, 1: 437})
(874, 175)
#####################################################################
Running ML analysis [COMPLETE DATA]: 70/30 split
Gene name: katG
Drug name: isoniazid
Output directory: /home/tanu/git/Data/isoniazid/output/ml/tts_cd_sl/
Sanity checks:
Total input features: 175
Training data size: (755, 175)
Test data size: (62, 175)
Target feature numbers (training data): Counter({0: 437, 1: 318})
Target features ratio (training data: 1.3742138364779874
Target feature numbers (test data): Counter({0: 36, 1: 26})
Target features ratio (test data): 1.3846153846153846
#####################################################################
================================================================
Strucutral features (n): 36
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03845477 0.04111099 0.04018474 0.04097939 0.03893089 0.03982425
0.04032898 0.0402391 0.03961587 0.0398469 ]
mean value: 0.03995158672332764
key: score_time
value: [0.01270437 0.01245141 0.01341653 0.01328707 0.01330495 0.01333356
0.01330638 0.01323581 0.01337004 0.01323414]
mean value: 0.013164424896240234
key: test_mcc
value: [0.41185791 0.51905381 0.41185791 0.37674326 0.70337995 0.51056179
0.48661327 0.54396846 0.69927678 0.61631563]
mean value: 0.5279628780527089
key: train_mcc
value: [0.64996517 0.64392978 0.665228 0.65217074 0.63909822 0.64501189
0.64719164 0.67987312 0.62943216 0.65474146]
mean value: 0.650664217640774
key: test_accuracy
value: [0.71052632 0.76315789 0.71052632 0.69736842 0.85526316 0.76
0.74666667 0.77333333 0.85333333 0.81333333]
mean value: 0.7683508771929825
key: train_accuracy
value: [0.82916053 0.82621502 0.8365243 0.83063328 0.82326951 0.82647059
0.82647059 0.84411765 0.81911765 0.83088235]
mean value: 0.8292861474486701
key: test_fscore
value: [0.66666667 0.72727273 0.66666667 0.63492063 0.81355932 0.71875
0.70769231 0.74626866 0.82539683 0.77419355]
mean value: 0.7281387355753242
key: train_fscore
value: [0.79790941 0.79442509 0.80695652 0.79789104 0.79310345 0.79584775
0.79931973 0.81403509 0.78608696 0.8020654 ]
mean value: 0.7987640429167654
key: test_precision
value: [0.64705882 0.70588235 0.64705882 0.64516129 0.88888889 0.6969697
0.67647059 0.71428571 0.83870968 0.8 ]
mean value: 0.726048585612153
key: train_precision
value: [0.79513889 0.79166667 0.80276817 0.80212014 0.78231293 0.79037801
0.7807309 0.81690141 0.78200692 0.78983051]
mean value: 0.793385452938167
key: test_recall
value: [0.6875 0.75 0.6875 0.625 0.75 0.74193548
0.74193548 0.78125 0.8125 0.75 ]
mean value: 0.7327620967741936
key: train_recall
value: [0.8006993 0.7972028 0.81118881 0.79370629 0.8041958 0.80139373
0.81881533 0.81118881 0.79020979 0.81468531]
mean value: 0.8043285982310373
key: test_roc_auc
value: [0.70738636 0.76136364 0.70738636 0.6875 0.84090909 0.75733138
0.74596774 0.77434593 0.84811047 0.80523256]
mean value: 0.7635533528268431
key: train_roc_auc
value: [0.82528604 0.82226552 0.83307532 0.82560633 0.82067297 0.82308872
0.8254382 0.83960456 0.81515566 0.82866245]
mean value: 0.8258855762883787
key: test_jcc
value: [0.5 0.57142857 0.5 0.46511628 0.68571429 0.56097561
0.54761905 0.5952381 0.7027027 0.63157895]
mean value: 0.5760373538896989
key: train_jcc
value: [0.66376812 0.65895954 0.67638484 0.66374269 0.65714286 0.66091954
0.66572238 0.68639053 0.64756447 0.66954023]
mean value: 0.6650135192542527
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.8990736 1.00928378 0.92249584 1.06357527 0.90100861 1.05567336
0.92924047 0.98903608 0.91987348 0.93955755]
mean value: 0.9628818035125732
key: score_time
value: [0.01704144 0.0151968 0.01524448 0.01529384 0.01529455 0.01550865
0.01517725 0.01560426 0.01515341 0.01526475]
mean value: 0.015477943420410156
key: test_mcc
value: [0.49527383 0.45626404 0.4822 0.48956862 0.6198304 0.60266409
0.75402183 0.52770861 0.64361974 0.45993751]
mean value: 0.5531088664051463
key: train_mcc
value: [0.78145361 0.76267653 0.76943669 0.76836323 0.77438554 0.77167332
0.78028857 0.71665959 0.74490432 0.78769765]
mean value: 0.7657539050746246
key: test_accuracy
value: [0.75 0.73684211 0.75 0.75 0.81578947 0.8
0.88 0.76 0.82666667 0.73333333]
mean value: 0.7802631578947369
key: train_accuracy
value: [0.89248895 0.88365243 0.88659794 0.88659794 0.88954345 0.88823529
0.89264706 0.86176471 0.875 0.89558824]
mean value: 0.8852116001039592
key: test_fscore
value: [0.71641791 0.67741935 0.68852459 0.70769231 0.77419355 0.7761194
0.85714286 0.74285714 0.78688525 0.6969697 ]
mean value: 0.742422205738622
key: train_fscore
value: [0.87521368 0.86402754 0.86837607 0.86701209 0.87046632 0.86896552
0.87348354 0.83623693 0.85370052 0.87863248]
mean value: 0.86561146749211
key: test_precision
value: [0.68571429 0.7 0.72413793 0.6969697 0.8 0.72222222
0.84375 0.68421053 0.82758621 0.67647059]
mean value: 0.7361061457388323
key: train_precision
value: [0.85618729 0.85084746 0.84949833 0.85665529 0.86006826 0.86006826
0.86896552 0.83333333 0.84067797 0.85953177]
mean value: 0.8535833474481594
key: test_recall
value: [0.75 0.65625 0.65625 0.71875 0.75 0.83870968
0.87096774 0.8125 0.75 0.71875 ]
mean value: 0.7522177419354839
key: train_recall
value: [0.8951049 0.87762238 0.88811189 0.87762238 0.88111888 0.87804878
0.87804878 0.83916084 0.86713287 0.8986014 ]
mean value: 0.8780573085451134
key: test_roc_auc
value: [0.75 0.72585227 0.73721591 0.74573864 0.80681818 0.80571848
0.87866569 0.76671512 0.81686047 0.73146802]
mean value: 0.7765052768874037
key: train_roc_auc
value: [0.89284507 0.88283155 0.88680404 0.88537607 0.88839659 0.88686154
0.89067833 0.85866671 0.87392176 0.89600121]
mean value: 0.8842382873178545
key: test_jcc
value: [0.55813953 0.51219512 0.525 0.54761905 0.63157895 0.63414634
0.75 0.59090909 0.64864865 0.53488372]
mean value: 0.5933120453773796
key: train_jcc
value: [0.7781155 0.76060606 0.7673716 0.7652439 0.7706422 0.76829268
0.77538462 0.71856287 0.74474474 0.78353659]
mean value: 0.7632500770281704
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0166285 0.01156783 0.01255155 0.01241207 0.01223373 0.01212716
0.01170945 0.01129079 0.01182938 0.01191139]
mean value: 0.012426185607910156
key: score_time
value: [0.01285315 0.01049924 0.01024437 0.0101192 0.00957632 0.00992823
0.00999331 0.01004028 0.00996041 0.01002264]
mean value: 0.010323715209960938
key: test_mcc
value: [0.31980107 0.27449801 0.31694616 0.40625 0.56818182 0.54100373
0.34692019 0.37816125 0.45123109 0.39714245]
mean value: 0.4000135759249381
key: train_mcc
value: [0.44674155 0.46040905 0.44846705 0.45967633 0.44505203 0.43840782
0.43709404 0.41215522 0.43630676 0.43976523]
mean value: 0.44240750857578764
key: test_accuracy
value: [0.65789474 0.64473684 0.64473684 0.71052632 0.78947368 0.77333333
0.68 0.64 0.73333333 0.69333333]
mean value: 0.6967368421052632
key: train_accuracy
value: [0.72312224 0.73048601 0.72459499 0.7275405 0.72164948 0.71764706
0.71470588 0.68382353 0.72794118 0.71911765]
mean value: 0.7190628519449016
key: test_fscore
value: [0.62857143 0.58461538 0.64 0.65625 0.75 0.73846154
0.625 0.68235294 0.67741935 0.67605634]
mean value: 0.6658726985691701
key: train_fscore
value: [0.69381107 0.700491 0.69394435 0.70304976 0.69367909 0.69131833
0.69303797 0.68975469 0.66179159 0.6904376 ]
mean value: 0.6911315462615466
key: test_precision
value: [0.57894737 0.57575758 0.55813953 0.65625 0.75 0.70588235
0.60606061 0.54716981 0.7 0.61538462]
mean value: 0.6293591864769502
key: train_precision
value: [0.64939024 0.65846154 0.65230769 0.64985163 0.64652568 0.64179104
0.63478261 0.58722359 0.69348659 0.64350453]
mean value: 0.6457325148933183
key: test_recall
value: [0.6875 0.59375 0.75 0.65625 0.75 0.77419355
0.64516129 0.90625 0.65625 0.75 ]
mean value: 0.7169354838709677
key: train_recall
value: [0.74475524 0.74825175 0.74125874 0.76573427 0.74825175 0.74912892
0.7630662 0.83566434 0.63286713 0.74475524]
mean value: 0.7473733583489681
key: test_roc_auc
value: [0.66193182 0.63778409 0.65909091 0.703125 0.78409091 0.77346041
0.67485337 0.67405523 0.72347384 0.7005814 ]
mean value: 0.6992446975380209
key: train_roc_auc
value: [0.72606719 0.7329045 0.72686347 0.73273991 0.72527091 0.7218927
0.72122776 0.7046342 0.71491072 0.72263143]
mean value: 0.7229142789213228
key: test_jcc
value: [0.45833333 0.41304348 0.47058824 0.48837209 0.6 0.58536585
0.45454545 0.51785714 0.51219512 0.5106383 ]
mean value: 0.501093901079627
key: train_jcc
value: [0.53117207 0.53904282 0.53132832 0.54207921 0.53101737 0.52825553
0.53026634 0.52643172 0.49453552 0.52722772]
mean value: 0.52813566214748
MCC on Blind test: 0.57
Accuracy on Blind test: 0.79
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0125792 0.01233125 0.01156139 0.01170778 0.01114821 0.01106834
0.01238728 0.01232433 0.01200199 0.0123198 ]
mean value: 0.01194295883178711
key: score_time
value: [0.01004314 0.00967956 0.00956941 0.0099113 0.00907087 0.00946569
0.00997758 0.00993609 0.00979972 0.01003027]
mean value: 0.009748363494873047
key: test_mcc
value: [0.59365605 0.28140559 0.17447146 0.48519965 0.48956862 0.46313625
0.35577952 0.38895144 0.5603175 0.25810271]
mean value: 0.40505887826745546
key: train_mcc
value: [0.432963 0.46234808 0.46694919 0.46570418 0.44960003 0.44154422
0.43213035 0.44706869 0.45428806 0.45555535]
mean value: 0.450815115609883
key: test_accuracy
value: [0.80263158 0.64473684 0.59210526 0.75 0.75 0.73333333
0.69333333 0.69333333 0.78666667 0.62666667]
mean value: 0.707280701754386
key: train_accuracy
value: [0.72164948 0.73784978 0.73784978 0.73784978 0.72901325 0.72647059
0.71911765 0.72941176 0.73235294 0.73235294]
mean value: 0.7303917958936151
key: test_fscore
value: [0.76190476 0.59701493 0.53731343 0.6984127 0.70769231 0.6969697
0.59649123 0.66666667 0.73333333 0.6 ]
mean value: 0.6595799051258595
key: train_fscore
value: [0.67692308 0.68881119 0.69727891 0.69520548 0.68813559 0.68041237
0.68113523 0.68275862 0.68835616 0.69047619]
mean value: 0.686949282203034
key: test_precision
value: [0.77419355 0.57142857 0.51428571 0.70967742 0.6969697 0.65714286
0.65384615 0.62162162 0.78571429 0.55263158]
mean value: 0.6537511447698205
key: train_precision
value: [0.66220736 0.68881119 0.67880795 0.68120805 0.66776316 0.67118644
0.65384615 0.67346939 0.67449664 0.67218543]
mean value: 0.67239817623147
key: test_recall
value: [0.75 0.625 0.5625 0.6875 0.71875 0.74193548
0.5483871 0.71875 0.6875 0.65625 ]
mean value: 0.6696572580645161
key: train_recall
value: [0.69230769 0.68881119 0.71678322 0.70979021 0.70979021 0.68989547
0.71080139 0.69230769 0.7027972 0.70979021]
mean value: 0.702307448648912
key: test_roc_auc
value: [0.79545455 0.64204545 0.58806818 0.74147727 0.74573864 0.73460411
0.67192082 0.6965843 0.77398256 0.63045058]
mean value: 0.7020326459455773
key: train_roc_auc
value: [0.71765512 0.73117404 0.73498194 0.73402996 0.72639638 0.72153807
0.71799612 0.72432643 0.72830215 0.72926059]
mean value: 0.7265660801452282
key: test_jcc
value: [0.61538462 0.42553191 0.36734694 0.53658537 0.54761905 0.53488372
0.425 0.5 0.57894737 0.42857143]
mean value: 0.4959870400449163
key: train_jcc
value: [0.51162791 0.52533333 0.53524804 0.5328084 0.5245478 0.515625
0.5164557 0.51832461 0.52480418 0.52727273]
mean value: 0.523204769300403
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01447773 0.01143241 0.01164007 0.01019645 0.01123261 0.01122975
0.01132512 0.01041198 0.01103187 0.01116967]
mean value: 0.011414766311645508
key: score_time
value: [0.08108211 0.01340771 0.01348948 0.013273 0.01390386 0.01398015
0.01823759 0.01867867 0.01493979 0.01354218]
mean value: 0.021453452110290528
key: test_mcc
value: [0.46022727 0.39836355 0.22765527 0.39617931 0.45626404 0.06268839
0.30951038 0.2369186 0.42015928 0.28614654]
mean value: 0.32541126471942616
key: train_mcc
value: [0.52476721 0.56995237 0.5594584 0.54382548 0.53998842 0.56545966
0.5811401 0.54484286 0.55700453 0.55603493]
mean value: 0.554247395764335
key: test_accuracy
value: [0.73684211 0.71052632 0.63157895 0.71052632 0.73684211 0.56
0.66666667 0.62666667 0.72 0.65333333]
mean value: 0.6752982456140351
key: train_accuracy
value: [0.77025037 0.79086892 0.78645066 0.77908689 0.77761414 0.78970588
0.79705882 0.77941176 0.78529412 0.78529412]
mean value: 0.7841035692627567
key: test_fscore
value: [0.6875 0.63333333 0.51724138 0.62068966 0.67741935 0.4
0.59016393 0.5625 0.61818182 0.58064516]
mean value: 0.5887674636553172
key: train_fscore
value: [0.71532847 0.74911661 0.73967684 0.72924188 0.72394881 0.73952641
0.75090253 0.7311828 0.73835125 0.73454545]
mean value: 0.7351821047557114
key: test_precision
value: [0.6875 0.67857143 0.57692308 0.69230769 0.7 0.45833333
0.6 0.5625 0.73913043 0.6 ]
mean value: 0.629526596591814
key: train_precision
value: [0.7480916 0.75714286 0.7601476 0.75373134 0.75862069 0.77480916
0.77902622 0.75 0.75735294 0.76515152]
mean value: 0.7604073928472855
key: test_recall
value: [0.6875 0.59375 0.46875 0.5625 0.65625 0.35483871
0.58064516 0.5625 0.53125 0.5625 ]
mean value: 0.5560483870967742
key: train_recall
value: [0.68531469 0.74125874 0.72027972 0.70629371 0.69230769 0.70731707
0.72473868 0.71328671 0.72027972 0.70629371]
mean value: 0.7117370434443605
key: test_roc_auc
value: [0.73011364 0.69460227 0.609375 0.69034091 0.72585227 0.52969208
0.65395894 0.6184593 0.69585756 0.64171512]
mean value: 0.6589967094046238
key: train_roc_auc
value: [0.75868788 0.78411538 0.77744266 0.76917739 0.76600117 0.77859492
0.78730572 0.77034894 0.77638351 0.77446665]
mean value: 0.7742524227309505
key: test_jcc
value: [0.52380952 0.46341463 0.34883721 0.45 0.51219512 0.25
0.41860465 0.39130435 0.44736842 0.40909091]
mean value: 0.4214624818341829
key: train_jcc
value: [0.55681818 0.59887006 0.58689459 0.57386364 0.56733524 0.5867052
0.60115607 0.57627119 0.58522727 0.58045977]
mean value: 0.5813601206085782
MCC on Blind test: 0.25
Accuracy on Blind test: 0.65
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03710723 0.03756094 0.04211354 0.03792882 0.03758621 0.03751993
0.03698134 0.03782248 0.03819609 0.03786802]
mean value: 0.03806846141815186
key: score_time
value: [0.01552296 0.0156126 0.0167017 0.01760411 0.01570702 0.01563406
0.01553059 0.01514482 0.01581478 0.01570559]
mean value: 0.01589782238006592
key: test_mcc
value: [0.43580096 0.35227273 0.42733892 0.39836355 0.64678324 0.53504163
0.39516129 0.51409028 0.56404163 0.47754813]
mean value: 0.4746442352543855
key: train_mcc
value: [0.59772146 0.6009229 0.61621456 0.59600332 0.57394829 0.5881027
0.60569092 0.59718466 0.58048551 0.59083661]
mean value: 0.5947110918550393
key: test_accuracy
value: [0.72368421 0.68421053 0.72368421 0.71052632 0.82894737 0.77333333
0.70666667 0.76 0.78666667 0.74666667]
mean value: 0.7444385964912281
key: train_accuracy
value: [0.80412371 0.80559647 0.81296024 0.80412371 0.79381443 0.8
0.80735294 0.80441176 0.79705882 0.80147059]
mean value: 0.803091267434809
key: test_fscore
value: [0.67692308 0.625 0.6557377 0.63333333 0.78688525 0.73015873
0.64516129 0.72727273 0.71428571 0.68852459]
mean value: 0.688328241327977
key: train_fscore
value: [0.76625659 0.76842105 0.77758319 0.76122083 0.74545455 0.75800712
0.7729636 0.76376554 0.74909091 0.75935829]
mean value: 0.7622121663731163
key: test_precision
value: [0.66666667 0.625 0.68965517 0.67857143 0.82758621 0.71875
0.64516129 0.70588235 0.83333333 0.72413793]
mean value: 0.7114744382180014
key: train_precision
value: [0.77031802 0.77112676 0.77894737 0.78228782 0.77651515 0.77454545
0.76896552 0.77617329 0.78030303 0.77454545]
mean value: 0.7753727866413102
key: test_recall
value: [0.6875 0.625 0.625 0.59375 0.75 0.74193548
0.64516129 0.75 0.625 0.65625 ]
mean value: 0.6699596774193548
key: train_recall
value: [0.76223776 0.76573427 0.77622378 0.74125874 0.71678322 0.74216028
0.77700348 0.75174825 0.72027972 0.74475524]
mean value: 0.7498184742087182
key: test_roc_auc
value: [0.71875 0.67613636 0.71022727 0.69460227 0.81818182 0.76869501
0.69758065 0.75872093 0.76598837 0.73510174]
mean value: 0.7343984433608403
key: train_roc_auc
value: [0.79842168 0.80016993 0.80795922 0.79556576 0.783328 0.79219973
0.80326001 0.79719392 0.7865358 0.79369742]
mean value: 0.7958331466379481
key: test_jcc
value: [0.51162791 0.45454545 0.48780488 0.46341463 0.64864865 0.575
0.47619048 0.57142857 0.55555556 0.525 ]
mean value: 0.5269216125540572
key: train_jcc
value: [0.62108262 0.62393162 0.63610315 0.61449275 0.5942029 0.61031519
0.6299435 0.61781609 0.59883721 0.61206897]
mean value: 0.6158794004895489
MCC on Blind test: 0.32
Accuracy on Blind test: 0.68
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.70295882 2.80787826 1.40164876 2.38740396 2.44509292 2.75604963
2.07706332 2.70040226 1.80623651 2.78324103]
mean value: 2.286797547340393
key: score_time
value: [0.01258898 0.01494551 0.01265264 0.01272011 0.0129509 0.01516461
0.01273394 0.01545763 0.01267385 0.01545882]
mean value: 0.013734698295593262
key: test_mcc
value: [0.60895504 0.37247785 0.48064296 0.34721981 0.48240733 0.48661327
0.55985938 0.47378743 0.5603175 0.45993751]
mean value: 0.48322180837308737
key: train_mcc
value: [0.85287344 0.93046902 0.79700491 0.89204246 0.93708923 0.95183175
0.85638717 0.96111937 0.86728304 0.92523004]
mean value: 0.897133043483631
key: test_accuracy
value: [0.80263158 0.69736842 0.75 0.68421053 0.75 0.74666667
0.76 0.73333333 0.78666667 0.73333333]
mean value: 0.744421052631579
key: train_accuracy
value: [0.9263623 0.96612666 0.90132548 0.9455081 0.96907216 0.97647059
0.92794118 0.98088235 0.93382353 0.96323529]
mean value: 0.9490747639261891
key: test_fscore
value: [0.7826087 0.62295082 0.6779661 0.61290323 0.65454545 0.70769231
0.75675676 0.71428571 0.73333333 0.6969697 ]
mean value: 0.6960012106408936
key: train_fscore
value: [0.91610738 0.95943563 0.88057041 0.93802345 0.96229803 0.97222222
0.91819699 0.9775475 0.92411467 0.95697074]
mean value: 0.9405487018518649
key: test_precision
value: [0.72972973 0.65517241 0.74074074 0.63333333 0.7826087 0.67647059
0.65116279 0.65789474 0.78571429 0.67647059]
mean value: 0.6989297902973735
key: train_precision
value: [0.88064516 0.96797153 0.89818182 0.90032154 0.98892989 0.96885813
0.88141026 0.96587031 0.89250814 0.94237288]
mean value: 0.9287069662172294
key: test_recall
value: [0.84375 0.59375 0.625 0.59375 0.5625 0.74193548
0.90322581 0.78125 0.6875 0.71875 ]
mean value: 0.705141129032258
key: train_recall
value: [0.95454545 0.95104895 0.86363636 0.97902098 0.93706294 0.97560976
0.95818815 0.98951049 0.95804196 0.97202797]
mean value: 0.953869301430277
key: test_roc_auc
value: [0.80823864 0.68323864 0.73295455 0.671875 0.72443182 0.74596774
0.78115836 0.73946221 0.77398256 0.73146802]
mean value: 0.7392777526768055
key: train_roc_auc
value: [0.93019894 0.96407409 0.89619477 0.95007029 0.96471467 0.9763545
0.93202029 0.98206489 0.93714281 0.96444038]
mean value: 0.9497275621991028
key: test_jcc
value: [0.64285714 0.45238095 0.51282051 0.44186047 0.48648649 0.54761905
0.60869565 0.55555556 0.57894737 0.53488372]
mean value: 0.5362106904361175
key: train_jcc
value: [0.84520124 0.9220339 0.7866242 0.88328076 0.92733564 0.94594595
0.84876543 0.95608108 0.85893417 0.91749175]
mean value: 0.889169411533274
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.05610561 0.04624677 0.04446769 0.04397821 0.04307437 0.04261947
0.04143143 0.04674125 0.04775262 0.04000306]
mean value: 0.04524204730987549
key: score_time
value: [0.00974059 0.00936556 0.00917125 0.00931931 0.00939131 0.00995302
0.00941157 0.00973129 0.01017833 0.00991678]
mean value: 0.009617900848388672
key: test_mcc
value: [0.59192216 0.57868822 0.67069242 0.64678324 0.59192216 0.64978463
0.67008798 0.62239581 0.726372 0.78140018]
mean value: 0.6530048787782348
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.80263158 0.78947368 0.82894737 0.82894737 0.80263158 0.82666667
0.84 0.81333333 0.86666667 0.89333333]
mean value: 0.8292631578947368
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.75409836 0.76470588 0.81690141 0.78688525 0.75409836 0.8
0.80645161 0.78787879 0.83870968 0.87096774]
mean value: 0.7980697078153612
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.79310345 0.72222222 0.74358974 0.82758621 0.79310345 0.76470588
0.80645161 0.76470588 0.86666667 0.9 ]
mean value: 0.7982135113536016
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.71875 0.8125 0.90625 0.75 0.71875 0.83870968
0.80645161 0.8125 0.8125 0.84375 ]
mean value: 0.802016129032258
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.79119318 0.79261364 0.83948864 0.81818182 0.79119318 0.82844575
0.83504399 0.81322674 0.85973837 0.88699128]
mean value: 0.8256116585964672
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.60526316 0.61904762 0.69047619 0.64864865 0.60526316 0.66666667
0.67567568 0.65 0.72222222 0.77142857]
mean value: 0.6654691909955068
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.63
Accuracy on Blind test: 0.82
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.17381358 0.16700077 0.16607237 0.17023063 0.17047739 0.17239523
0.16992283 0.16555643 0.16628838 0.17819834]
mean value: 0.1699955940246582
key: score_time
value: [0.02001786 0.01942492 0.02027106 0.02043462 0.02041912 0.0209651
0.0195148 0.01986623 0.02057624 0.02045608]
mean value: 0.020194602012634278
key: test_mcc
value: [0.56530828 0.29269769 0.4822 0.36720508 0.51078616 0.36478009
0.53058923 0.41225113 0.6184593 0.45494186]
mean value: 0.45992188220568675
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78947368 0.65789474 0.75 0.69736842 0.76315789 0.69333333
0.77333333 0.70666667 0.81333333 0.73333333]
mean value: 0.7377894736842106
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.74193548 0.58064516 0.68852459 0.59649123 0.70967742 0.62295082
0.72131148 0.67647059 0.78125 0.6875 ]
mean value: 0.68067567660675
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.76666667 0.6 0.72413793 0.68 0.73333333 0.63333333
0.73333333 0.63888889 0.78125 0.6875 ]
mean value: 0.6978443486590038
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.71875 0.5625 0.65625 0.53125 0.6875 0.61290323
0.70967742 0.71875 0.78125 0.6875 ]
mean value: 0.666633064516129
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.77982955 0.64488636 0.73721591 0.67471591 0.75284091 0.68145161
0.76392962 0.70821221 0.80922965 0.72747093]
mean value: 0.7279782658732865
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58974359 0.40909091 0.525 0.425 0.55 0.45238095
0.56410256 0.51111111 0.64102564 0.52380952]
mean value: 0.5191264291264291
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.46
Accuracy on Blind test: 0.74
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01297903 0.01341438 0.01251864 0.01181912 0.01186943 0.0112195
0.01164675 0.0116322 0.0120945 0.01247215]
mean value: 0.012166571617126466
key: score_time
value: [0.0102303 0.00969195 0.00987315 0.00981855 0.00973344 0.0094564
0.00986338 0.00978637 0.00979996 0.00987363]
mean value: 0.009812712669372559
key: test_mcc
value: [0.21405867 0.22073036 0.12913133 0.48240733 0.4040992 0.16607219
0.13749357 0.33503026 0.17609018 0.04233617]
mean value: 0.23074492714957756
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.61842105 0.61842105 0.57894737 0.75 0.69736842 0.57333333
0.57333333 0.66666667 0.6 0.53333333]
mean value: 0.6209824561403509
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.53968254 0.55384615 0.48387097 0.65454545 0.67605634 0.55555556
0.51515152 0.63768116 0.51612903 0.44444444]
mean value: 0.5576963160674122
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.5483871 0.54545455 0.5 0.7826087 0.61538462 0.48780488
0.48571429 0.59459459 0.53333333 0.4516129 ]
mean value: 0.5544894948182328
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.53125 0.5625 0.46875 0.5625 0.75 0.64516129
0.5483871 0.6875 0.5 0.4375 ]
mean value: 0.5693548387096774
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.60653409 0.61079545 0.56392045 0.72443182 0.70454545 0.58394428
0.56964809 0.6693314 0.5872093 0.52107558]
mean value: 0.614143592716361
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.36956522 0.38297872 0.31914894 0.48648649 0.5106383 0.38461538
0.34693878 0.46808511 0.34782609 0.28571429]
mean value: 0.39019973005039743
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.11
Accuracy on Blind test: 0.58
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.52887535 2.54070401 2.48067498 2.40389156 2.36476159 2.3751862
2.35466981 2.37321854 2.36698627 2.35058093]
mean value: 2.4139549255371096
key: score_time
value: [0.10534024 0.10218143 0.09882855 0.10170054 0.09665203 0.09587097
0.0963614 0.09620166 0.1036911 0.09608531]
mean value: 0.09929132461547852
key: test_mcc
value: [0.67460105 0.59365605 0.73881068 0.56410605 0.70463922 0.78329779
0.75402183 0.7663997 0.83648256 0.81028771]
mean value: 0.7226302653782992
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84210526 0.80263158 0.86842105 0.78947368 0.85526316 0.89333333
0.88 0.88 0.92 0.90666667]
mean value: 0.8637894736842106
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.76190476 0.85294118 0.73333333 0.83076923 0.875
0.85714286 0.86956522 0.90625 0.89230769]
mean value: 0.8379214269319768
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.77419355 0.80555556 0.78571429 0.81818182 0.84848485
0.84375 0.81081081 0.90625 0.87878788]
mean value: 0.8328871603065151
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.75 0.90625 0.6875 0.84375 0.90322581
0.87096774 0.9375 0.90625 0.90625 ]
mean value: 0.8461693548387097
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.82954545 0.79545455 0.87357955 0.77556818 0.85369318 0.89479472
0.87866569 0.88735465 0.91824128 0.90661337]
mean value: 0.8613510621973675
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.61538462 0.74358974 0.57894737 0.71052632 0.77777778
0.75 0.76923077 0.82857143 0.80555556]
mean value: 0.7246250240987083
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.6
Accuracy on Blind test: 0.81
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [2.00792384 1.07964015 1.14048624 1.0987649 1.05054617 1.07768703
1.08028054 1.06092668 1.1231184 1.1270175 ]
mean value: 1.1846391439437867
key: score_time
value: [0.25205517 0.28182316 0.22127581 0.29803705 0.28213239 0.29165244
0.27522254 0.27276921 0.28709555 0.27418542]
mean value: 0.2736248731613159
key: test_mcc
value: [0.67460105 0.59365605 0.70914208 0.61935355 0.73011364 0.83784499
0.78005865 0.81415147 0.80876688 0.81028771]
mean value: 0.7377976067565862
key: train_mcc
value: [0.90938451 0.90326789 0.89730244 0.90962285 0.91874398 0.90651302
0.89762206 0.90958458 0.90344681 0.89741223]
mean value: 0.9052900373277607
key: test_accuracy
value: [0.84210526 0.80263158 0.85526316 0.81578947 0.86842105 0.92
0.89333333 0.90666667 0.90666667 0.90666667]
mean value: 0.8717543859649123
key: train_accuracy
value: [0.95581738 0.95287187 0.94992636 0.95581738 0.96023564 0.95441176
0.95 0.95588235 0.95294118 0.95 ]
mean value: 0.953790392445638
key: test_fscore
value: [0.8 0.76190476 0.8358209 0.76666667 0.84375 0.90625
0.87096774 0.89552239 0.88888889 0.89230769]
mean value: 0.8462079035285583
key: train_fscore
value: [0.94755245 0.94385965 0.94055944 0.94791667 0.95320624 0.94589878
0.94097222 0.94773519 0.94405594 0.94055944]
mean value: 0.9452316019904221
key: test_precision
value: [0.85714286 0.77419355 0.8 0.82142857 0.84375 0.87878788
0.87096774 0.85714286 0.90322581 0.87878788]
mean value: 0.8485427140064237
key: train_precision
value: [0.94755245 0.9471831 0.94055944 0.94137931 0.94501718 0.94755245
0.93771626 0.94444444 0.94405594 0.94055944]
mean value: 0.9436020018766904
key: test_recall
value: [0.75 0.75 0.875 0.71875 0.84375 0.93548387
0.87096774 0.9375 0.875 0.90625 ]
mean value: 0.8462701612903226
key: train_recall
value: [0.94755245 0.94055944 0.94055944 0.95454545 0.96153846 0.94425087
0.94425087 0.95104895 0.94405594 0.94055944]
mean value: 0.946892132257986
key: test_roc_auc
value: [0.82954545 0.79545455 0.85795455 0.80255682 0.86505682 0.92228739
0.89002933 0.91061047 0.90261628 0.90661337]
mean value: 0.8682725013639774
key: train_roc_auc
value: [0.95469225 0.95119575 0.94865122 0.95564423 0.960413 0.95304147
0.94922467 0.95521991 0.9517234 0.94870612]
mean value: 0.952851201686529
key: test_jcc
value: [0.66666667 0.61538462 0.71794872 0.62162162 0.72972973 0.82857143
0.77142857 0.81081081 0.8 0.80555556]
mean value: 0.7367717717717718
key: train_jcc
value: [0.90033223 0.89368771 0.88778878 0.9009901 0.91059603 0.89735099
0.88852459 0.90066225 0.89403974 0.88778878]
mean value: 0.8961761187106945
MCC on Blind test: 0.63
Accuracy on Blind test: 0.82
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.03019071 0.01226044 0.01226664 0.01272726 0.01263142 0.01290274
0.01258135 0.01262927 0.01268077 0.01258755]
mean value: 0.014345812797546386
key: score_time
value: [0.01134753 0.00947285 0.01033521 0.01011801 0.01010323 0.01012897
0.01017475 0.0101254 0.0101676 0.01010561]
mean value: 0.010207915306091308
key: test_mcc
value: [0.59365605 0.28140559 0.17447146 0.48519965 0.48956862 0.46313625
0.35577952 0.38895144 0.5603175 0.25810271]
mean value: 0.40505887826745546
key: train_mcc
value: [0.432963 0.46234808 0.46694919 0.46570418 0.44960003 0.44154422
0.43213035 0.44706869 0.45428806 0.45555535]
mean value: 0.450815115609883
key: test_accuracy
value: [0.80263158 0.64473684 0.59210526 0.75 0.75 0.73333333
0.69333333 0.69333333 0.78666667 0.62666667]
mean value: 0.707280701754386
key: train_accuracy
value: [0.72164948 0.73784978 0.73784978 0.73784978 0.72901325 0.72647059
0.71911765 0.72941176 0.73235294 0.73235294]
mean value: 0.7303917958936151
key: test_fscore
value: [0.76190476 0.59701493 0.53731343 0.6984127 0.70769231 0.6969697
0.59649123 0.66666667 0.73333333 0.6 ]
mean value: 0.6595799051258595
key: train_fscore
value: [0.67692308 0.68881119 0.69727891 0.69520548 0.68813559 0.68041237
0.68113523 0.68275862 0.68835616 0.69047619]
mean value: 0.686949282203034
key: test_precision
value: [0.77419355 0.57142857 0.51428571 0.70967742 0.6969697 0.65714286
0.65384615 0.62162162 0.78571429 0.55263158]
mean value: 0.6537511447698205
key: train_precision
value: [0.66220736 0.68881119 0.67880795 0.68120805 0.66776316 0.67118644
0.65384615 0.67346939 0.67449664 0.67218543]
mean value: 0.67239817623147
key: test_recall
value: [0.75 0.625 0.5625 0.6875 0.71875 0.74193548
0.5483871 0.71875 0.6875 0.65625 ]
mean value: 0.6696572580645161
key: train_recall
value: [0.69230769 0.68881119 0.71678322 0.70979021 0.70979021 0.68989547
0.71080139 0.69230769 0.7027972 0.70979021]
mean value: 0.702307448648912
key: test_roc_auc
value: [0.79545455 0.64204545 0.58806818 0.74147727 0.74573864 0.73460411
0.67192082 0.6965843 0.77398256 0.63045058]
mean value: 0.7020326459455773
key: train_roc_auc
value: [0.71765512 0.73117404 0.73498194 0.73402996 0.72639638 0.72153807
0.71799612 0.72432643 0.72830215 0.72926059]
mean value: 0.7265660801452282
key: test_jcc
value: [0.61538462 0.42553191 0.36734694 0.53658537 0.54761905 0.53488372
0.425 0.5 0.57894737 0.42857143]
mean value: 0.4959870400449163
key: train_jcc
value: [0.51162791 0.52533333 0.53524804 0.5328084 0.5245478 0.515625
0.5164557 0.51832461 0.52480418 0.52727273]
mean value: 0.523204769300403
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.15459466 0.12359762 0.13257623 0.11857748 0.12010121 0.12283087
0.11335015 0.26927233 0.12270784 0.1195004 ]
mean value: 0.1397108793258667
key: score_time
value: [0.01213288 0.01233554 0.0116086 0.01127672 0.01136041 0.01135826
0.01148677 0.01124954 0.01111746 0.01170921]
mean value: 0.011563539505004883
key: test_mcc
value: [0.81056883 0.64788424 0.86594218 0.6198304 0.75650539 0.83784499
0.80876688 0.78485412 0.86351193 0.78485412]
mean value: 0.778056307737122
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90789474 0.82894737 0.93421053 0.81578947 0.88157895 0.92
0.90666667 0.89333333 0.93333333 0.89333333]
mean value: 0.8915087719298246
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.79365079 0.92307692 0.77419355 0.85245902 0.90625
0.88888889 0.87878788 0.92063492 0.87878788]
mean value: 0.8705618737496712
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90322581 0.80645161 0.90909091 0.8 0.89655172 0.87878788
0.875 0.85294118 0.93548387 0.85294118]
mean value: 0.8710474155280475
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.78125 0.9375 0.75 0.8125 0.93548387
0.90322581 0.90625 0.90625 0.90625 ]
mean value: 0.8713709677419355
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90340909 0.82244318 0.93465909 0.80681818 0.87215909 0.92228739
0.90615836 0.89498547 0.92986919 0.89498547]
mean value: 0.8887774500443293
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.65789474 0.85714286 0.63157895 0.74285714 0.82857143
0.8 0.78378378 0.85294118 0.78378378]
mean value: 0.7738553856820111
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.05141187 0.09285474 0.0785737 0.09084392 0.08712173 0.07668972
0.08316016 0.08379889 0.08368492 0.08396983]
mean value: 0.08121094703674317
key: score_time
value: [0.01939702 0.02001953 0.01978993 0.01961112 0.02019668 0.01976013
0.0197196 0.01980257 0.01976991 0.0196898 ]
mean value: 0.0197756290435791
key: test_mcc
value: [0.47970161 0.46022727 0.48956862 0.41185791 0.51905381 0.6340079
0.54846768 0.52770861 0.56395349 0.57412984]
mean value: 0.5208676744130143
key: train_mcc
value: [0.75228604 0.76267653 0.75505868 0.75783987 0.76384416 0.73844324
0.73034032 0.74334134 0.7373458 0.75579488]
mean value: 0.7496970872434735
key: test_accuracy
value: [0.73684211 0.73684211 0.75 0.71052632 0.76315789 0.81333333
0.77333333 0.76 0.78666667 0.78666667]
mean value: 0.7617368421052632
key: train_accuracy
value: [0.87776141 0.88365243 0.87923417 0.88070692 0.88365243 0.87205882
0.86617647 0.87352941 0.87058824 0.87941176]
mean value: 0.8766772069652603
key: test_fscore
value: [0.71428571 0.6875 0.70769231 0.66666667 0.72727273 0.79411765
0.74626866 0.74285714 0.75 0.76470588]
mean value: 0.7301366744902742
key: train_fscore
value: [0.85908319 0.86402754 0.86054422 0.86201022 0.86541738 0.84974093
0.84757119 0.8537415 0.85034014 0.86101695]
mean value: 0.8573493249947532
key: test_precision
value: [0.65789474 0.6875 0.6969697 0.64705882 0.70588235 0.72972973
0.69444444 0.68421053 0.75 0.72222222]
mean value: 0.6975912532994577
key: train_precision
value: [0.8349835 0.85084746 0.83774834 0.84053156 0.84385382 0.84246575
0.81612903 0.83112583 0.82781457 0.83552632]
mean value: 0.8361026181230804
key: test_recall
value: [0.78125 0.6875 0.71875 0.6875 0.75 0.87096774
0.80645161 0.8125 0.75 0.8125 ]
mean value: 0.7677419354838709
key: train_recall
value: [0.88461538 0.87762238 0.88461538 0.88461538 0.88811189 0.85714286
0.8815331 0.87762238 0.87412587 0.88811189]
mean value: 0.8798116517628712
key: test_roc_auc
value: [0.74289773 0.73011364 0.74573864 0.70738636 0.76136364 0.82184751
0.77822581 0.76671512 0.78197674 0.78997093]
mean value: 0.7626236104480666
key: train_roc_auc
value: [0.87869446 0.88283155 0.87996673 0.88123899 0.88425951 0.87004726
0.86824747 0.87409038 0.87107309 0.88060417]
mean value: 0.8771053583080383
key: test_jcc
value: [0.55555556 0.52380952 0.54761905 0.5 0.57142857 0.65853659
0.5952381 0.59090909 0.6 0.61904762]
mean value: 0.5762144088973358
key: train_jcc
value: [0.75297619 0.76060606 0.75522388 0.75748503 0.76276276 0.73873874
0.73546512 0.74480712 0.73964497 0.75595238]
mean value: 0.750366225242826
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01528907 0.0125401 0.0123477 0.01201797 0.01186919 0.01119399
0.01182628 0.01187348 0.0116477 0.01158071]
mean value: 0.012218618392944336
key: score_time
value: [0.01129532 0.01013231 0.00990796 0.0098362 0.00951409 0.00974369
0.00975084 0.00975108 0.00947189 0.00925994]
mean value: 0.0098663330078125
key: test_mcc
value: [0.35825997 0.38203331 0.36519159 0.39836355 0.59365605 0.5716838
0.33434239 0.49312416 0.5930524 0.36594507]
mean value: 0.4455652298405159
key: train_mcc
value: [0.47425208 0.45368292 0.47959222 0.49692127 0.46628488 0.46594407
0.46798304 0.46220361 0.47219115 0.45022029]
mean value: 0.4689275528576301
key: test_accuracy
value: [0.68421053 0.69736842 0.68421053 0.71052632 0.80263158 0.78666667
0.68 0.73333333 0.8 0.68 ]
mean value: 0.7258947368421053
key: train_accuracy
value: [0.74079529 0.73195876 0.7437408 0.75110457 0.73637703 0.73676471
0.73676471 0.73529412 0.73970588 0.72941176]
mean value: 0.7381917612405787
key: test_fscore
value: [0.63636364 0.64615385 0.64705882 0.63333333 0.76190476 0.75757576
0.6 0.72972973 0.76923077 0.65714286]
mean value: 0.6838493514964104
key: train_fscore
value: [0.7027027 0.68835616 0.70508475 0.71691792 0.69915966 0.69814503
0.70116861 0.69491525 0.70151771 0.68813559]
mean value: 0.6996103393349323
key: test_precision
value: [0.61764706 0.63636364 0.61111111 0.67857143 0.77419355 0.71428571
0.62068966 0.64285714 0.75757576 0.60526316]
mean value: 0.6658558211042568
key: train_precision
value: [0.67973856 0.67449664 0.68421053 0.68810289 0.67313916 0.67647059
0.67307692 0.67434211 0.67752443 0.66776316]
mean value: 0.6768864989606861
key: test_recall
value: [0.65625 0.65625 0.6875 0.59375 0.75 0.80645161
0.58064516 0.84375 0.78125 0.71875 ]
mean value: 0.7074596774193549
key: train_recall
value: [0.72727273 0.7027972 0.72727273 0.74825175 0.72727273 0.72125436
0.73170732 0.71678322 0.72727273 0.70979021]
mean value: 0.7239674959187155
key: test_roc_auc
value: [0.68039773 0.69176136 0.68465909 0.69460227 0.79545455 0.78958944
0.66532258 0.7474564 0.79760174 0.6849564 ]
mean value: 0.7231801558344132
key: train_roc_auc
value: [0.73895443 0.72798893 0.74149896 0.7507162 0.73513764 0.73467298
0.73608267 0.73275709 0.73800185 0.72672252]
mean value: 0.7362533259808247
key: test_jcc
value: [0.46666667 0.47727273 0.47826087 0.46341463 0.61538462 0.6097561
0.42857143 0.57446809 0.625 0.4893617 ]
mean value: 0.5228156826402015
key: train_jcc
value: [0.54166667 0.52480418 0.54450262 0.55874674 0.5374677 0.53626943
0.53984576 0.53246753 0.54025974 0.5245478 ]
mean value: 0.5380578163315645
MCC on Blind test: 0.32
Accuracy on Blind test: 0.68
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02463841 0.03144503 0.02579641 0.02777719 0.02671218 0.02404237
0.02409434 0.0281806 0.02527308 0.02363944]
mean value: 0.026159906387329103
key: score_time
value: [0.01157069 0.01227355 0.01218486 0.01215577 0.01218534 0.01224351
0.0121696 0.01213717 0.01212764 0.01214361]
mean value: 0.012119174003601074
key: test_mcc
value: [0.2763854 0.34311605 0.56410605 0.44912659 0.6053757 0.49865921
0.64112865 0.30980985 0.42806382 0.22589682]
mean value: 0.4341668135250491
key: train_mcc
value: [0.24210359 0.65038224 0.62574954 0.52435929 0.62830318 0.6292708
0.62197849 0.24323178 0.40003685 0.31982482]
mean value: 0.4885240589433125
key: test_accuracy
value: [0.63157895 0.68421053 0.78947368 0.65789474 0.80263158 0.76
0.82666667 0.64 0.69333333 0.62666667]
mean value: 0.7112456140350878
key: train_accuracy
value: [0.62150221 0.83063328 0.81885125 0.69955817 0.8173785 0.81911765
0.81617647 0.62058824 0.69558824 0.64852941]
mean value: 0.7387923416789396
key: test_fscore
value: [0.22222222 0.6 0.73333333 0.70454545 0.71698113 0.68965517
0.78688525 0.27027027 0.43902439 0.3 ]
mean value: 0.5462917221006087
key: train_fscore
value: [0.18927445 0.78743068 0.77348066 0.7357513 0.752 0.76116505
0.76007678 0.17834395 0.46233766 0.28228228]
mean value: 0.568214280782849
key: test_precision
value: [1. 0.64285714 0.78571429 0.55357143 0.9047619 0.74074074
0.8 1. 1. 0.75 ]
mean value: 0.8177645502645503
key: train_precision
value: [0.96774194 0.83529412 0.81712062 0.58436214 0.87850467 0.85964912
0.84615385 1. 0.8989899 1. ]
mean value: 0.8687816356464677
key: test_recall
value: [0.125 0.5625 0.6875 0.96875 0.59375 0.64516129
0.77419355 0.15625 0.28125 0.1875 ]
mean value: 0.4981854838709677
key: train_recall
value: [0.1048951 0.74475524 0.73426573 0.99300699 0.65734266 0.68292683
0.68989547 0.0979021 0.31118881 0.16433566]
mean value: 0.5180514607343876
key: test_roc_auc
value: [0.5625 0.66761364 0.77556818 0.70028409 0.77414773 0.74303519
0.81891496 0.578125 0.640625 0.57049419]
mean value: 0.6831307969037714
key: train_roc_auc
value: [0.55117529 0.81894251 0.80733643 0.73950604 0.79559245 0.80075095
0.79914621 0.54895105 0.64290405 0.58216783]
mean value: 0.7086472800759291
key: test_jcc
value: [0.125 0.42857143 0.57894737 0.54385965 0.55882353 0.52631579
0.64864865 0.15625 0.28125 0.17647059]
mean value: 0.402413700188468
key: train_jcc
value: [0.10452962 0.64939024 0.63063063 0.58196721 0.6025641 0.61442006
0.6130031 0.0979021 0.30067568 0.16433566]
mean value: 0.43594184035212596
MCC on Blind test: 0.39
Accuracy on Blind test: 0.71
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0311873 0.03052235 0.0290103 0.03574967 0.03354788 0.02688074
0.03113914 0.02523851 0.03100848 0.02994823]
mean value: 0.03042325973510742
key: score_time
value: [0.01227045 0.0122273 0.01223731 0.01219487 0.01226044 0.01223755
0.01211143 0.01226664 0.01223755 0.01227546]
mean value: 0.012231898307800294
key: test_mcc
value: [0.51340315 0.45464014 0.33711412 0.48064296 0.44756059 0.52646266
0.50718127 0.55985938 0.63555097 0.48272697]
mean value: 0.4945142231717486
key: train_mcc
value: [0.6517648 0.64055475 0.60314556 0.70913466 0.48594004 0.60604618
0.64341812 0.6951258 0.66005565 0.6439389 ]
mean value: 0.6339124456528761
key: test_accuracy
value: [0.76315789 0.73684211 0.68421053 0.75 0.72368421 0.77333333
0.76 0.76 0.81333333 0.73333333]
mean value: 0.7497894736842106
key: train_accuracy
value: [0.82474227 0.82326951 0.80559647 0.85861561 0.73048601 0.80735294
0.82205882 0.83382353 0.81911765 0.80735294]
mean value: 0.8132415749805076
key: test_fscore
value: [0.66666667 0.62962963 0.55555556 0.6779661 0.55319149 0.70175439
0.64 0.76315789 0.8 0.72222222]
mean value: 0.6710143945832445
key: train_fscore
value: [0.75259875 0.76095618 0.73493976 0.82156134 0.53904282 0.74059406
0.75255624 0.82852807 0.81105991 0.80241327]
mean value: 0.7544250396680352
key: test_precision
value: [0.81818182 0.77272727 0.68181818 0.74074074 0.86666667 0.76923077
0.84210526 0.65909091 0.73684211 0.65 ]
mean value: 0.7537403726877411
key: train_precision
value: [0.92820513 0.88425926 0.86320755 0.87698413 0.96396396 0.85779817
0.91089109 0.73190349 0.72328767 0.70557029]
mean value: 0.8446070728093572
key: test_recall
value: [0.5625 0.53125 0.46875 0.625 0.40625 0.64516129
0.51612903 0.90625 0.875 0.8125 ]
mean value: 0.6348790322580645
key: train_recall
value: [0.63286713 0.66783217 0.63986014 0.77272727 0.37412587 0.65156794
0.64111498 0.95454545 0.92307692 0.93006993]
mean value: 0.7187787821934164
key: test_roc_auc
value: [0.73579545 0.70880682 0.65482955 0.73295455 0.68039773 0.75439883
0.72397361 0.7787064 0.82122093 0.7434593 ]
mean value: 0.7334543152833662
key: train_roc_auc
value: [0.79862186 0.80210947 0.7830344 0.84692343 0.68197388 0.78634377
0.79765673 0.85036917 0.83336587 0.82417202]
mean value: 0.8004570600754091
key: test_jcc
value: [0.5 0.45945946 0.38461538 0.51282051 0.38235294 0.54054054
0.47058824 0.61702128 0.66666667 0.56521739]
mean value: 0.5099282408473245
key: train_jcc
value: [0.60333333 0.61414791 0.58095238 0.69716088 0.36896552 0.58805031
0.60327869 0.70725389 0.68217054 0.67002519]
mean value: 0.6115338645328594
MCC on Blind test: 0.48
Accuracy on Blind test: 0.74
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.24882793 0.23553991 0.23604679 0.23582315 0.22026038 0.22161198
0.22113228 0.22108078 0.22125912 0.22034097]
mean value: 0.22819232940673828
key: score_time
value: [0.01706672 0.01650715 0.01781344 0.01561666 0.01562548 0.01558304
0.015558 0.01575947 0.01544976 0.0155189 ]
mean value: 0.016049861907958984
key: test_mcc
value: [0.75906419 0.70463922 0.74620251 0.56530828 0.75840687 0.80693778
0.75856554 0.82032088 0.87424206 0.75597889]
mean value: 0.7549666233834014
key: train_mcc
value: [0.88569188 0.8827871 0.88843663 0.88534567 0.90948725 0.88249665
0.8914807 0.88270706 0.86486766 0.88841788]
mean value: 0.8861718479384622
key: test_accuracy
value: [0.88157895 0.85526316 0.86842105 0.78947368 0.88157895 0.90666667
0.88 0.90666667 0.93333333 0.88 ]
mean value: 0.8782982456140351
key: train_accuracy
value: [0.94403535 0.94256259 0.9455081 0.94403535 0.95581738 0.94264706
0.94705882 0.94264706 0.93382353 0.94558824]
mean value: 0.9443723468768951
key: test_fscore
value: [0.84745763 0.83076923 0.85714286 0.74193548 0.86153846 0.8852459
0.86153846 0.89855072 0.92753623 0.86153846]
mean value: 0.8573253441678168
key: train_fscore
value: [0.93425606 0.93264249 0.93565217 0.93379791 0.94773519 0.93217391
0.93728223 0.93240901 0.92227979 0.93542757]
mean value: 0.9343656339425788
key: test_precision
value: [0.92592593 0.81818182 0.78947368 0.76666667 0.84848485 0.9
0.82352941 0.83783784 0.86486486 0.84848485]
mean value: 0.8423449906422042
key: train_precision
value: [0.92465753 0.92150171 0.93079585 0.93055556 0.94444444 0.93055556
0.93728223 0.92439863 0.9112628 0.93379791]
mean value: 0.9289252207474825
key: test_recall
value: [0.78125 0.84375 0.9375 0.71875 0.875 0.87096774
0.90322581 0.96875 1. 0.875 ]
mean value: 0.8774193548387097
key: train_recall
value: [0.94405594 0.94405594 0.94055944 0.93706294 0.95104895 0.93379791
0.93728223 0.94055944 0.93356643 0.93706294]
mean value: 0.939905216734485
key: test_roc_auc
value: [0.86789773 0.85369318 0.87784091 0.77982955 0.88068182 0.90139296
0.88343109 0.91460756 0.94186047 0.87936047]
mean value: 0.8780595717111096
key: train_roc_auc
value: [0.94403815 0.94276589 0.94483443 0.94308618 0.95516824 0.94145366
0.94574035 0.94236094 0.93378829 0.94441979]
mean value: 0.9437655919246752
key: test_jcc
value: [0.73529412 0.71052632 0.75 0.58974359 0.75675676 0.79411765
0.75675676 0.81578947 0.86486486 0.75675676]
mean value: 0.7530606279058292
key: train_jcc
value: [0.87662338 0.87378641 0.87908497 0.87581699 0.90066225 0.87296417
0.88196721 0.87337662 0.85576923 0.87868852]
mean value: 0.876873975806219
MCC on Blind test: 0.6
Accuracy on Blind test: 0.81
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.1199789 0.11597323 0.1297648 0.13439751 0.1378653 0.13134599
0.1068418 0.12545419 0.11151147 0.11002517]
mean value: 0.12231583595275879
key: score_time
value: [0.02938819 0.03441763 0.03304958 0.0306685 0.04402232 0.02725291
0.03886533 0.02465963 0.03792119 0.02057219]
mean value: 0.03208174705505371
key: test_mcc
value: [0.73011364 0.59365605 0.81056883 0.61935355 0.75650539 0.78329779
0.77914379 0.76011486 0.83648256 0.70167006]
mean value: 0.737090652577633
key: train_mcc
value: [0.98491637 0.98497426 0.9969826 0.99395897 0.98497426 0.98794895
0.9909846 0.97297152 0.97588669 0.98793085]
mean value: 0.9861529074785003
key: test_accuracy
value: [0.86842105 0.80263158 0.90789474 0.81578947 0.88157895 0.89333333
0.89333333 0.88 0.92 0.85333333]
mean value: 0.8716315789473684
key: train_accuracy
value: [0.99263623 0.99263623 0.99852725 0.99705449 0.99263623 0.99411765
0.99558824 0.98676471 0.98823529 0.99411765]
mean value: 0.9932313956510439
key: test_fscore
value: [0.84375 0.76190476 0.88888889 0.76666667 0.85245902 0.875
0.86666667 0.86567164 0.90625 0.83076923]
mean value: 0.8458026873080702
key: train_fscore
value: [0.99121265 0.99118166 0.99824869 0.9965035 0.99118166 0.99300699
0.99474606 0.9840708 0.98591549 0.99300699]
mean value: 0.9919074487470159
key: test_precision
value: [0.84375 0.77419355 0.90322581 0.82142857 0.89655172 0.84848485
0.89655172 0.82857143 0.90625 0.81818182]
mean value: 0.8537189469781239
key: train_precision
value: [0.99646643 1. 1. 0.9965035 1. 0.99649123
1. 0.99641577 0.9929078 0.99300699]
mean value: 0.997179172070383
key: test_recall
value: [0.84375 0.75 0.875 0.71875 0.8125 0.90322581
0.83870968 0.90625 0.90625 0.84375 ]
mean value: 0.8398185483870968
key: train_recall
value: [0.98601399 0.98251748 0.9965035 0.9965035 0.98251748 0.98954704
0.98954704 0.97202797 0.97902098 0.99300699]
mean value: 0.9867205964766941
key: test_roc_auc
value: [0.86505682 0.79545455 0.90340909 0.80255682 0.87215909 0.89479472
0.88526393 0.88335756 0.91824128 0.85210756]
mean value: 0.8672401410011594
key: train_roc_auc
value: [0.99173473 0.99125874 0.99825175 0.99697948 0.99125874 0.99350125
0.99477352 0.98474495 0.98697242 0.99396543]
mean value: 0.9923441010825366
key: test_jcc
value: [0.72972973 0.61538462 0.8 0.62162162 0.74285714 0.77777778
0.76470588 0.76315789 0.82857143 0.71052632]
mean value: 0.7354332408821573
key: train_jcc
value: [0.9825784 0.98251748 0.9965035 0.99303136 0.98251748 0.98611111
0.98954704 0.96864111 0.97222222 0.98611111]
mean value: 0.9839780815390572
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.29794788 0.33068109 0.32837296 0.22896051 0.25346708 0.25612593
0.30490828 0.32389164 0.38753223 0.31187224]
mean value: 0.30237598419189454
key: score_time
value: [0.03014493 0.03123856 0.03024912 0.01791668 0.01918936 0.01773715
0.01803207 0.03264809 0.03435659 0.01904559]
mean value: 0.025055813789367675
key: test_mcc
value: [0.42733892 0.23817557 0.31048235 0.30837431 0.42733892 0.38584038
0.44554274 0.37080648 0.44845932 0.40043605]
mean value: 0.376279503821083
key: train_mcc
value: [0.91859995 0.93049929 0.91836725 0.92155805 0.91836725 0.93363727
0.92457091 0.92471388 0.91551596 0.90937403]
mean value: 0.9215203827710721
key: test_accuracy
value: [0.72368421 0.63157895 0.67105263 0.67105263 0.72368421 0.70666667
0.73333333 0.69333333 0.73333333 0.70666667]
mean value: 0.6994385964912281
key: train_accuracy
value: [0.96023564 0.96612666 0.96023564 0.96170839 0.96023564 0.96764706
0.96323529 0.96323529 0.95882353 0.95588235]
mean value: 0.9617365502902192
key: test_fscore
value: [0.6557377 0.5483871 0.56140351 0.54545455 0.6557377 0.62068966
0.66666667 0.63492063 0.64285714 0.65625 ]
mean value: 0.6188104660453593
key: train_fscore
value: [0.95304348 0.95971979 0.95254833 0.95470383 0.95254833 0.96153846
0.95621716 0.95652174 0.95104895 0.9471831 ]
mean value: 0.9545073174845851
key: test_precision
value: [0.68965517 0.56666667 0.64 0.65217391 0.68965517 0.66666667
0.68965517 0.64516129 0.75 0.65625 ]
mean value: 0.6645884053940772
key: train_precision
value: [0.94809689 0.96140351 0.95759717 0.95138889 0.95759717 0.96491228
0.96126761 0.95155709 0.95104895 0.95390071]
mean value: 0.9558770269793693
key: test_recall
value: [0.625 0.53125 0.5 0.46875 0.625 0.58064516
0.64516129 0.625 0.5625 0.65625 ]
mean value: 0.5819556451612903
key: train_recall
value: [0.95804196 0.95804196 0.94755245 0.95804196 0.94755245 0.95818815
0.95121951 0.96153846 0.95104895 0.94055944]
mean value: 0.9531785287882849
key: test_roc_auc
value: [0.71022727 0.61789773 0.64772727 0.64346591 0.71022727 0.68804985
0.72030792 0.68459302 0.71148256 0.70021802]
mean value: 0.6834196830457614
key: train_roc_auc
value: [0.95993701 0.96502607 0.95850905 0.96120927 0.95850905 0.96637143
0.96161485 0.96300273 0.95775798 0.95378226]
mean value: 0.9605719693449956
key: test_jcc
value: [0.48780488 0.37777778 0.3902439 0.375 0.48780488 0.45
0.5 0.46511628 0.47368421 0.48837209]
mean value: 0.44958040189337023
key: train_jcc
value: [0.910299 0.92255892 0.90939597 0.91333333 0.90939597 0.92592593
0.91610738 0.91666667 0.90666667 0.89966555]
mean value: 0.91300153991723
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.98012018 1.0008018 0.97443652 0.96889162 0.98432636 0.98858047
0.97377872 0.96670055 0.95690131 0.9568224 ]
mean value: 0.9751359939575195
key: score_time
value: [0.0106864 0.0102756 0.00961518 0.01002574 0.01072812 0.00973248
0.00965405 0.00952554 0.00958014 0.00956964]
mean value: 0.009939289093017578
key: test_mcc
value: [0.83791887 0.70463922 0.86954326 0.73011364 0.72984855 0.86351193
0.83504399 0.8439277 0.89098837 0.78485412]
mean value: 0.8090389648876973
key: train_mcc
value: [0.95467504 0.96074971 0.95774986 0.96375527 0.97886645 0.94879831
0.97291732 0.96691528 0.95471476 0.95773493]
mean value: 0.9616876917488373
key: test_accuracy
value: [0.92105263 0.85526316 0.93421053 0.86842105 0.86842105 0.93333333
0.92 0.92 0.94666667 0.89333333]
mean value: 0.9060701754385965
key: train_accuracy
value: [0.97790869 0.9808542 0.97938144 0.98232695 0.98969072 0.975
0.98676471 0.98382353 0.97794118 0.97941176]
mean value: 0.9813103179416096
key: test_fscore
value: [0.90322581 0.83076923 0.92537313 0.84375 0.83333333 0.92063492
0.90322581 0.91176471 0.9375 0.87878788]
mean value: 0.88883648166393
key: train_fscore
value: [0.9737303 0.97707231 0.97526502 0.97887324 0.98769772 0.97001764
0.98418278 0.98053097 0.97363796 0.9754386 ]
mean value: 0.9776446525287324
key: test_precision
value: [0.93333333 0.81818182 0.88571429 0.84375 0.89285714 0.90625
0.90322581 0.86111111 0.9375 0.85294118]
mean value: 0.8834864674119892
key: train_precision
value: [0.9754386 0.98576512 0.98571429 0.9858156 0.99293286 0.98214286
0.9929078 0.99283154 0.97879859 0.97887324]
mean value: 0.9851220497577359
key: test_recall
value: [0.875 0.84375 0.96875 0.84375 0.78125 0.93548387
0.90322581 0.96875 0.9375 0.90625 ]
mean value: 0.8963709677419355
key: train_recall
value: [0.97202797 0.96853147 0.96503497 0.97202797 0.98251748 0.95818815
0.97560976 0.96853147 0.96853147 0.97202797]
mean value: 0.9703028678638435
key: test_roc_auc
value: [0.91477273 0.85369318 0.93892045 0.86505682 0.85653409 0.93365103
0.91752199 0.92623547 0.94549419 0.89498547]
mean value: 0.9046865409534202
key: train_roc_auc
value: [0.97710813 0.97917668 0.97742842 0.98092493 0.98871421 0.97273275
0.98526035 0.98172766 0.97665152 0.97839977]
mean value: 0.9798124432188077
key: test_jcc
value: [0.82352941 0.71052632 0.86111111 0.72972973 0.71428571 0.85294118
0.82352941 0.83783784 0.88235294 0.78378378]
mean value: 0.801962743371412
key: train_jcc
value: [0.94880546 0.95517241 0.95172414 0.95862069 0.97569444 0.94178082
0.96885813 0.96180556 0.94863014 0.95205479]
mean value: 0.956314658704271
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03680634 0.03623056 0.03745699 0.0447278 0.03589249 0.0368638
0.03617692 0.03619432 0.03663182 0.04341626]
mean value: 0.03803973197937012
key: score_time
value: [0.01258159 0.01285815 0.01291418 0.01342416 0.01339483 0.01283622
0.01341558 0.01345205 0.01280928 0.01292062]
mean value: 0.013060665130615235
key: test_mcc
value: [ 0.06511653 -0.01736441 0.01131552 -0.05484543 0.09543651 0.07872809
0.04790658 0.02114775 -0.00123563 0.2543852 ]
mean value: 0.05005907069255838
key: train_mcc
value: [0.23263611 0.26559326 0.22727327 0.25075154 0.2245568 0.22202874
0.22477213 0.24260216 0.23742053 0.21015426]
mean value: 0.23377887989238055
key: test_accuracy
value: [0.46052632 0.43421053 0.43421053 0.42105263 0.47368421 0.44
0.44 0.45333333 0.44 0.50666667]
mean value: 0.4503684210526316
key: train_accuracy
value: [0.4904271 0.5095729 0.48748159 0.50073638 0.48600884 0.48529412
0.48676471 0.49558824 0.49264706 0.47794118]
mean value: 0.49124620982413586
key: test_fscore
value: [0.58585859 0.56565657 0.58252427 0.56 0.59183673 0.58823529
0.58 0.57731959 0.58 0.63366337]
mean value: 0.5845094406136836
key: train_fscore
value: [0.62309368 0.6320442 0.62173913 0.62788145 0.62106406 0.62121212
0.62188516 0.62513661 0.62377317 0.61704423]
mean value: 0.6234873813424298
key: test_precision
value: [0.43283582 0.41791045 0.42253521 0.41176471 0.43939394 0.42253521
0.42028986 0.43076923 0.42647059 0.46376812]
mean value: 0.42882731264872376
key: train_precision
value: [0.45253165 0.46203554 0.4511041 0.4576 0.4503937 0.45054945
0.45125786 0.45468998 0.45324881 0.44617785]
mean value: 0.4529588943309634
key: test_recall
value: [0.90625 0.875 0.9375 0.875 0.90625 0.96774194
0.93548387 0.875 0.90625 1. ]
mean value: 0.9184475806451613
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.52130682 0.49431818 0.50284091 0.48295455 0.53267045 0.51796188
0.51319648 0.50726744 0.49963663 0.56976744]
mean value: 0.5141920778490078
key: train_roc_auc
value: [0.55979644 0.57633588 0.55725191 0.56870229 0.55597964 0.55470738
0.55597964 0.56472081 0.56218274 0.54949239]
mean value: 0.5605149119747872
key: test_jcc
value: [0.41428571 0.3943662 0.4109589 0.38888889 0.42028986 0.41666667
0.4084507 0.4057971 0.4084507 0.46376812]
mean value: 0.41319228520484297
key: train_jcc
value: [0.45253165 0.46203554 0.4511041 0.4576 0.4503937 0.45054945
0.45125786 0.45468998 0.45324881 0.44617785]
mean value: 0.4529588943309634
MCC on Blind test: 0.06
Accuracy on Blind test: 0.45
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02691889 0.03458738 0.03361678 0.04719758 0.03613091 0.04117703
0.0362668 0.04120827 0.04111934 0.03775907]
mean value: 0.03759820461273193
key: score_time
value: [0.02929235 0.02923179 0.02925587 0.03111696 0.01960897 0.0196991
0.01907682 0.01908278 0.01912689 0.01908231]
mean value: 0.02345738410949707
key: test_mcc
value: [0.57868822 0.38833971 0.46022727 0.46022727 0.59365605 0.5716838
0.61965619 0.59080018 0.59800506 0.61631563]
mean value: 0.5477599382068561
key: train_mcc
value: [0.70478986 0.71257893 0.7093521 0.71453678 0.68855355 0.70348941
0.69287079 0.73607702 0.69767103 0.72858931]
mean value: 0.7088508784572793
key: test_accuracy
value: [0.78947368 0.69736842 0.73684211 0.73684211 0.80263158 0.78666667
0.81333333 0.78666667 0.8 0.81333333]
mean value: 0.7763157894736842
key: train_accuracy
value: [0.85419735 0.85861561 0.85714286 0.86008837 0.84683358 0.85441176
0.84852941 0.87058824 0.85147059 0.86617647]
mean value: 0.8568054232002079
key: test_fscore
value: [0.76470588 0.65671642 0.6875 0.6875 0.76190476 0.75757576
0.78125 0.77777778 0.7761194 0.77419355]
mean value: 0.7425243548893857
key: train_fscore
value: [0.83248731 0.83617747 0.83418803 0.83648881 0.8225256 0.83076923
0.82571912 0.84879725 0.82735043 0.84550085]
mean value: 0.8340004105908049
key: test_precision
value: [0.72222222 0.62857143 0.6875 0.6875 0.77419355 0.71428571
0.75757576 0.7 0.74285714 0.8 ]
mean value: 0.7214705813899362
key: train_precision
value: [0.80655738 0.81666667 0.81605351 0.82372881 0.80333333 0.81543624
0.80263158 0.83445946 0.80936455 0.82178218]
mean value: 0.8150013709044559
key: test_recall
value: [0.8125 0.6875 0.6875 0.6875 0.75 0.80645161
0.80645161 0.875 0.8125 0.75 ]
mean value: 0.7675403225806452
key: train_recall
value: [0.86013986 0.85664336 0.85314685 0.84965035 0.84265734 0.8466899
0.85017422 0.86363636 0.84615385 0.87062937]
mean value: 0.8539521454155601
key: test_roc_auc
value: [0.79261364 0.69602273 0.73011364 0.73011364 0.79545455 0.78958944
0.81231672 0.79796512 0.80159884 0.80523256]
mean value: 0.775102085180386
key: train_roc_auc
value: [0.85500632 0.85834712 0.85659887 0.85866741 0.84626506 0.85337039
0.84875123 0.86963544 0.8507419 0.86678677]
mean value: 0.8564170512536526
key: test_jcc
value: [0.61904762 0.48888889 0.52380952 0.52380952 0.61538462 0.6097561
0.64102564 0.63636364 0.63414634 0.63157895]
mean value: 0.592381083472226
key: train_jcc
value: [0.71304348 0.71847507 0.71554252 0.71893491 0.69855072 0.71052632
0.70317003 0.73731343 0.70553936 0.73235294]
mean value: 0.7153448786669865
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.3526473 0.30884123 0.30870199 0.30832839 0.31096816 0.32195187
0.43453765 0.25615191 0.27564645 0.28402328]
mean value: 0.31617982387542726
key: score_time
value: [0.01904249 0.01905918 0.01906228 0.01901865 0.02059817 0.01903939
0.02135944 0.02266073 0.01898694 0.01897216]
mean value: 0.01977994441986084
key: test_mcc
value: [0.57868822 0.38833971 0.46022727 0.48519965 0.54874089 0.5716838
0.61965619 0.56896508 0.59800506 0.61631563]
mean value: 0.5435821511452588
key: train_mcc
value: [0.70478986 0.71257893 0.7093521 0.7393944 0.75183558 0.70348941
0.69287079 0.7581098 0.69767103 0.72858931]
mean value: 0.7198681212311817
key: test_accuracy
value: [0.78947368 0.69736842 0.73684211 0.75 0.77631579 0.78666667
0.81333333 0.77333333 0.8 0.81333333]
mean value: 0.7736666666666666
key: train_accuracy
value: [0.85419735 0.85861561 0.85714286 0.8718704 0.87776141 0.85441176
0.84852941 0.88088235 0.85147059 0.86617647]
mean value: 0.8621058217101273
key: test_fscore
value: [0.76470588 0.65671642 0.6875 0.6984127 0.74626866 0.75757576
0.78125 0.76712329 0.7761194 0.77419355]
mean value: 0.7409865652011667
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.83248731 0.83617747 0.83418803 0.85128205 0.85860307 0.83076923
0.82571912 0.86201022 0.82735043 0.84550085]
mean value: 0.8404087784573542
key: test_precision
value: [0.72222222 0.62857143 0.6875 0.70967742 0.71428571 0.71428571
0.75757576 0.68292683 0.74285714 0.8 ]
mean value: 0.7159902228421111
key: train_precision
value: [0.80655738 0.81666667 0.81605351 0.83277592 0.8372093 0.81543624
0.80263158 0.84053156 0.80936455 0.82178218]
mean value: 0.8199008886212261
key: test_recall
value: [0.8125 0.6875 0.6875 0.6875 0.78125 0.80645161
0.80645161 0.875 0.8125 0.75 ]
mean value: 0.7706653225806451
key: train_recall
value: [0.86013986 0.85664336 0.85314685 0.87062937 0.88111888 0.8466899
0.85017422 0.88461538 0.84615385 0.87062937]
mean value: 0.861994103457518
key: test_roc_auc
value: [0.79261364 0.69602273 0.73011364 0.74147727 0.77698864 0.78958944
0.81231672 0.78633721 0.80159884 0.80523256]
mean value: 0.7732290672099843
key: train_roc_auc
value: [0.85500632 0.85834712 0.85659887 0.87170145 0.87821847 0.85337039
0.84875123 0.88139399 0.8507419 0.86678677]
mean value: 0.8620916513851831
key: test_jcc
value: [0.61904762 0.48888889 0.52380952 0.53658537 0.5952381 0.6097561
0.64102564 0.62222222 0.63414634 0.63157895]
mean value: 0.590229874247846
key: train_jcc
value: [0.71304348 0.71847507 0.71554252 0.74107143 0.75223881 0.71052632
0.70317003 0.75748503 0.70553936 0.73235294]
mean value: 0.7249444982435457
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03952503 0.04458117 0.04192305 0.04709601 0.0482595 0.04177332
0.04183102 0.04927802 0.04940033 0.07689357]
mean value: 0.04805610179901123
key: score_time
value: [0.01453161 0.01288486 0.01447868 0.01298976 0.01239872 0.01299834
0.01297569 0.0130744 0.01301527 0.0146513 ]
mean value: 0.013399863243103027
key: test_mcc
value: [0.71976336 0.54545455 0.43192975 0.57188626 0.7472238 0.63521
0.60331932 0.57461562 0.63213531 0.67866682]
mean value: 0.6140204775170993
key: train_mcc
value: [0.71710512 0.68945784 0.70905448 0.70931293 0.66941284 0.68501726
0.68613192 0.68932352 0.69416521 0.69122274]
mean value: 0.6940203873531311
key: test_accuracy
value: [0.85227273 0.77272727 0.71590909 0.78409091 0.87356322 0.81609195
0.79310345 0.7816092 0.81609195 0.83908046]
mean value: 0.8044540229885058
key: train_accuracy
value: [0.85750636 0.84351145 0.85368957 0.85368957 0.83354511 0.841169
0.84243964 0.84371029 0.84625159 0.84371029]
mean value: 0.8459222867784708
key: test_fscore
value: [0.86597938 0.77272727 0.71910112 0.79569892 0.87058824 0.82222222
0.8125 0.80412371 0.81818182 0.84444444]
mean value: 0.8125567133980068
key: train_fscore
value: [0.8627451 0.84981685 0.85854859 0.85889571 0.84043849 0.84811665
0.84729064 0.84907975 0.85116851 0.85126965]
mean value: 0.8517369930941097
key: test_precision
value: [0.79245283 0.77272727 0.71111111 0.75510204 0.88095238 0.78723404
0.73584906 0.73584906 0.81818182 0.82608696]
mean value: 0.7815546566260066
key: train_precision
value: [0.8321513 0.81690141 0.83095238 0.82938389 0.80796253 0.81351981
0.82296651 0.81990521 0.82380952 0.81105991]
mean value: 0.8208612470780035
key: test_recall
value: [0.95454545 0.77272727 0.72727273 0.84090909 0.86046512 0.86046512
0.90697674 0.88636364 0.81818182 0.86363636]
mean value: 0.849154334038055
key: train_recall
value: [0.8956743 0.88549618 0.88804071 0.89058524 0.87563452 0.8857868
0.87309645 0.88040712 0.88040712 0.8956743 ]
mean value: 0.8850802753774816
key: test_roc_auc
value: [0.85227273 0.77272727 0.71590909 0.78409091 0.87341438 0.81659619
0.79439746 0.78039112 0.81606765 0.83879493]
mean value: 0.8044661733615223
key: train_roc_auc
value: [0.85750636 0.84351145 0.85368957 0.85368957 0.83349156 0.84111223
0.84240064 0.84375686 0.84629493 0.84377624]
mean value: 0.8459229408041746
key: test_jcc
value: [0.76363636 0.62962963 0.56140351 0.66071429 0.77083333 0.69811321
0.68421053 0.67241379 0.69230769 0.73076923]
mean value: 0.6864031571128872
key: train_jcc
value: [0.75862069 0.7388535 0.75215517 0.75268817 0.72478992 0.73628692
0.73504274 0.73773987 0.74089936 0.74105263]
mean value: 0.7418128969385925
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.98496866 0.95473599 1.09272647 0.97494578 1.13974237 1.4481535
1.49764705 1.4215436 1.07872605 1.23035407]
mean value: 1.1823543548583983
key: score_time
value: [0.01475596 0.01501179 0.0150342 0.01500845 0.0150547 0.01534939
0.01498508 0.01506901 0.01850271 0.01581573]
mean value: 0.01545870304107666
key: test_mcc
value: [0.66759342 0.66062825 0.50051733 0.5933661 0.77102073 0.65696218
0.81702814 0.61269937 0.60940803 0.63444041]
mean value: 0.6523663980572162
key: train_mcc
value: [0.79768851 0.81031799 0.79792694 0.81520262 0.81278376 0.81320255
0.78072794 0.82034342 0.82365593 0.81322894]
mean value: 0.8085078601766735
key: test_accuracy
value: [0.82954545 0.82954545 0.75 0.79545455 0.88505747 0.82758621
0.90804598 0.8045977 0.8045977 0.81609195]
mean value: 0.8250522466039707
key: train_accuracy
value: [0.89821883 0.90458015 0.89821883 0.90712468 0.90597205 0.90597205
0.88945362 0.90978399 0.91105464 0.90597205]
mean value: 0.9036350879915678
key: test_fscore
value: [0.84210526 0.83516484 0.74418605 0.80434783 0.88636364 0.83146067
0.90909091 0.8172043 0.8045977 0.82608696]
mean value: 0.8300608149279597
key: train_fscore
value: [0.9009901 0.9070632 0.90123457 0.90931677 0.90818859 0.90864198
0.89325153 0.91158157 0.91358025 0.90841584]
mean value: 0.9062264386395962
key: test_precision
value: [0.78431373 0.80851064 0.76190476 0.77083333 0.86666667 0.80434783
0.88888889 0.7755102 0.81395349 0.79166667]
mean value: 0.8066596199789068
key: train_precision
value: [0.87710843 0.88405797 0.87529976 0.88834951 0.88834951 0.88461538
0.86460808 0.89268293 0.88729017 0.88433735]
mean value: 0.8826699098784945
key: test_recall
value: [0.90909091 0.86363636 0.72727273 0.84090909 0.90697674 0.86046512
0.93023256 0.86363636 0.79545455 0.86363636]
mean value: 0.8561310782241015
key: train_recall
value: [0.92620865 0.93129771 0.92875318 0.93129771 0.92893401 0.93401015
0.92385787 0.93129771 0.94147583 0.93384224]
mean value: 0.9310975058446674
key: test_roc_auc
value: [0.82954545 0.82954545 0.75 0.79545455 0.88530655 0.82795983
0.9082981 0.80391121 0.80470402 0.81553911]
mean value: 0.8250264270613108
key: train_roc_auc
value: [0.89821883 0.90458015 0.89821883 0.90712468 0.90594283 0.90593637
0.88940985 0.90981129 0.91109324 0.90600741]
mean value: 0.903634349853399
key: test_jcc
value: [0.72727273 0.71698113 0.59259259 0.67272727 0.79591837 0.71153846
0.83333333 0.69090909 0.67307692 0.7037037 ]
mean value: 0.7118053604576515
key: train_jcc
value: [0.81981982 0.82993197 0.82022472 0.83371298 0.83181818 0.83257919
0.80709534 0.8375286 0.84090909 0.83219955]
mean value: 0.8285819448297327
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01602221 0.01385975 0.01371217 0.0158987 0.01669502 0.01303077
0.01360488 0.01454973 0.01411724 0.01296067]
mean value: 0.014445114135742187
key: score_time
value: [0.01249313 0.01088738 0.01216459 0.0105288 0.00993156 0.01178885
0.01053643 0.01033878 0.01024318 0.01122117]
mean value: 0.011013388633728027
key: test_mcc
value: [0.28767798 0.40951418 0.29926602 0.56832862 0.60920157 0.51879367
0.51879367 0.24188306 0.42547569 0.35627361]
mean value: 0.4235208045282095
key: train_mcc
value: [0.41486656 0.49201246 0.46947331 0.46683522 0.46468129 0.45990325
0.47404657 0.47833815 0.46483462 0.50586826]
mean value: 0.46908596855294155
key: test_accuracy
value: [0.63636364 0.70454545 0.64772727 0.78409091 0.8045977 0.75862069
0.75862069 0.62068966 0.71264368 0.67816092]
mean value: 0.7106060606060606
key: train_accuracy
value: [0.69974555 0.74554707 0.73409669 0.73282443 0.73189327 0.72935197
0.73570521 0.73824651 0.73189327 0.75222363]
mean value: 0.7331527590521547
key: test_fscore
value: [0.68627451 0.71111111 0.67368421 0.78651685 0.8 0.76404494
0.76404494 0.64516129 0.71264368 0.68888889]
mean value: 0.7232370430386771
key: train_fscore
value: [0.73542601 0.75308642 0.74355828 0.74201474 0.7404674 0.73929009
0.74939759 0.74878049 0.72200264 0.7607362 ]
mean value: 0.7434759852829844
key: test_precision
value: [0.60344828 0.69565217 0.62745098 0.77777778 0.80952381 0.73913043
0.73913043 0.6122449 0.72093023 0.67391304]
mean value: 0.6999202061029658
key: train_precision
value: [0.65731463 0.73141487 0.71800948 0.71733967 0.71837709 0.71394799
0.71330275 0.71896956 0.74863388 0.73459716]
mean value: 0.7171907065852907
key: test_recall
value: [0.79545455 0.72727273 0.72727273 0.79545455 0.79069767 0.79069767
0.79069767 0.68181818 0.70454545 0.70454545]
mean value: 0.750845665961945
key: train_recall
value: [0.8346056 0.77608142 0.77099237 0.76844784 0.76395939 0.76649746
0.7893401 0.78117048 0.69720102 0.78880407]
mean value: 0.7737099753296909
key: test_roc_auc
value: [0.63636364 0.70454545 0.64772727 0.78409091 0.80443975 0.7589852
0.7589852 0.61997886 0.71273784 0.67785412]
mean value: 0.7105708245243129
key: train_roc_auc
value: [0.69974555 0.74554707 0.73409669 0.73282443 0.73185247 0.72930471
0.73563697 0.73830098 0.73184924 0.75227006]
mean value: 0.7331428165484817
key: test_jcc
value: [0.52238806 0.55172414 0.50793651 0.64814815 0.66666667 0.61818182
0.61818182 0.47619048 0.55357143 0.52542373]
mean value: 0.568841279032295
key: train_jcc
value: [0.58156028 0.6039604 0.59179688 0.58984375 0.58789062 0.58640777
0.59922929 0.59844055 0.56494845 0.61386139]
mean value: 0.5917939369364226
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01541185 0.01635194 0.01640701 0.01618457 0.01612496 0.01588988
0.02039766 0.01608372 0.01671124 0.02363396]
mean value: 0.017319679260253906
key: score_time
value: [0.01239347 0.01237416 0.01222992 0.0126605 0.01228976 0.01222897
0.01241994 0.01231122 0.01263213 0.01260352]
mean value: 0.012414360046386718
key: test_mcc
value: [0.68252363 0.36514837 0.34530694 0.52613536 0.61090601 0.3853797
0.52126134 0.40330006 0.54295079 0.24714945]
mean value: 0.46300616567387165
key: train_mcc
value: [0.47428882 0.51540005 0.50501003 0.4893689 0.47802164 0.49540494
0.48842804 0.50058709 0.49168322 0.49679032]
mean value: 0.4934983053216603
key: test_accuracy
value: [0.84090909 0.68181818 0.67045455 0.76136364 0.8045977 0.68965517
0.75862069 0.70114943 0.77011494 0.62068966]
mean value: 0.729937304075235
key: train_accuracy
value: [0.73536896 0.75699746 0.7519084 0.74300254 0.73824651 0.74714104
0.7433291 0.74968234 0.74459975 0.74714104]
mean value: 0.7457417124972922
key: test_fscore
value: [0.84444444 0.69565217 0.69473684 0.77419355 0.80898876 0.70967742
0.76923077 0.7173913 0.76190476 0.66666667]
mean value: 0.7442886694399654
key: train_fscore
value: [0.75059952 0.76564417 0.7601476 0.75721154 0.74878049 0.75582822
0.75425791 0.75768758 0.75636364 0.75878788]
mean value: 0.7565308540334024
key: test_precision
value: [0.82608696 0.66666667 0.64705882 0.73469388 0.7826087 0.66
0.72916667 0.6875 0.8 0.6 ]
mean value: 0.7133781686587679
key: train_precision
value: [0.70975057 0.73933649 0.73571429 0.71753986 0.72065728 0.73159145
0.72429907 0.73333333 0.72222222 0.72453704]
mean value: 0.725898159276402
key: test_recall
value: [0.86363636 0.72727273 0.75 0.81818182 0.8372093 0.76744186
0.81395349 0.75 0.72727273 0.75 ]
mean value: 0.7804968287526427
key: train_recall
value: [0.79643766 0.79389313 0.78625954 0.80152672 0.77918782 0.78172589
0.78680203 0.78371501 0.79389313 0.79643766]
mean value: 0.7899878585913382
key: test_roc_auc
value: [0.84090909 0.68181818 0.67045455 0.76136364 0.80496829 0.69053911
0.75924947 0.7005814 0.77061311 0.61918605]
mean value: 0.7299682875264271
key: train_roc_auc
value: [0.73536896 0.75699746 0.7519084 0.74300254 0.73819442 0.74709704
0.74327379 0.74972553 0.7446623 0.7472036 ]
mean value: 0.7457434029526873
key: test_jcc
value: [0.73076923 0.53333333 0.53225806 0.63157895 0.67924528 0.55
0.625 0.55932203 0.61538462 0.5 ]
mean value: 0.5956891508288903
key: train_jcc
value: [0.60076775 0.62027833 0.61309524 0.60928433 0.59844055 0.60749507
0.60546875 0.60990099 0.60818713 0.61132812]
mean value: 0.6084246269566757
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01486754 0.01294136 0.01227379 0.01070833 0.01139832 0.01107717
0.01244855 0.01302791 0.01268315 0.01264668]
mean value: 0.012407279014587403
key: score_time
value: [0.04071927 0.02211118 0.01623821 0.01385784 0.01441145 0.01481605
0.01512694 0.01559448 0.01551914 0.01509857]
mean value: 0.018349313735961915
key: test_mcc
value: [0.54772256 0.54601891 0.38646346 0.54601891 0.56748941 0.58327727
0.35695404 0.42729122 0.35695404 0.42729122]
mean value: 0.47454810311174817
key: train_mcc
value: [0.63569864 0.64349398 0.60969189 0.61419466 0.63993928 0.62666052
0.65652237 0.62817177 0.65053867 0.63436197]
mean value: 0.633927376906168
key: test_accuracy
value: [0.77272727 0.77272727 0.69318182 0.77272727 0.7816092 0.7816092
0.67816092 0.71264368 0.67816092 0.71264368]
mean value: 0.7356191222570533
key: train_accuracy
value: [0.81679389 0.82061069 0.80407125 0.80661578 0.81956798 0.81321474
0.82719187 0.81321474 0.82465057 0.81702668]
mean value: 0.8162958185010233
key: test_fscore
value: [0.7826087 0.77777778 0.69662921 0.77777778 0.79120879 0.80412371
0.68181818 0.7311828 0.6744186 0.7311828 ]
mean value: 0.7448728345107067
key: train_fscore
value: [0.82396088 0.82783883 0.81081081 0.81188119 0.82425743 0.81602003
0.83414634 0.8196319 0.82962963 0.81954887]
mean value: 0.8217725902851899
key: test_precision
value: [0.75 0.76086957 0.68888889 0.76086957 0.75 0.72222222
0.66666667 0.69387755 0.69047619 0.69387755]
mean value: 0.7177748200729567
key: train_precision
value: [0.79294118 0.79577465 0.78384798 0.79036145 0.80434783 0.80493827
0.8028169 0.79146919 0.8057554 0.80740741]
mean value: 0.7979660247642671
key: test_recall
value: [0.81818182 0.79545455 0.70454545 0.79545455 0.8372093 0.90697674
0.69767442 0.77272727 0.65909091 0.77272727]
mean value: 0.7760042283298098
key: train_recall
value: [0.85750636 0.86259542 0.83969466 0.8346056 0.84517766 0.82741117
0.8680203 0.84987277 0.85496183 0.83206107]
mean value: 0.8471906846979502
key: test_roc_auc
value: [0.77272727 0.77272727 0.69318182 0.77272727 0.78224101 0.78303383
0.67838266 0.71194503 0.67838266 0.71194503]
mean value: 0.7357293868921776
key: train_roc_auc
value: [0.81679389 0.82061069 0.80407125 0.80661578 0.8195354 0.81319668
0.82713992 0.81326126 0.82468904 0.81704576]
mean value: 0.816295966210718
key: test_jcc
value: [0.64285714 0.63636364 0.53448276 0.63636364 0.65454545 0.67241379
0.51724138 0.57627119 0.50877193 0.57627119]
mean value: 0.595558210387027
key: train_jcc
value: [0.7006237 0.70625 0.68181818 0.68333333 0.70105263 0.68921776
0.71548117 0.69438669 0.70886076 0.69426752]
mean value: 0.6975291747691413
MCC on Blind test: 0.22
Accuracy on Blind test: 0.63
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.04696131 0.05472875 0.05334735 0.04616427 0.05575418 0.04985499
0.05442619 0.05384159 0.04604459 0.05362082]
mean value: 0.05147440433502197
key: score_time
value: [0.01842785 0.01855063 0.01799917 0.01824617 0.01761675 0.01863289
0.01906061 0.01764941 0.01791215 0.0178628 ]
mean value: 0.018195843696594237
key: test_mcc
value: [0.67124862 0.48038446 0.52286233 0.5547002 0.70137421 0.58908039
0.5504913 0.52749822 0.60940803 0.50171077]
mean value: 0.5708758522241896
key: train_mcc
value: [0.67561944 0.69521698 0.67661176 0.68605278 0.66161012 0.66633066
0.66558603 0.66092901 0.66447916 0.65813246]
mean value: 0.6710568396829444
key: test_accuracy
value: [0.81818182 0.73863636 0.76136364 0.77272727 0.85057471 0.79310345
0.77011494 0.75862069 0.8045977 0.74712644]
mean value: 0.7815047021943573
key: train_accuracy
value: [0.8346056 0.84478372 0.83587786 0.84096692 0.82846252 0.83100381
0.83100381 0.82846252 0.82973316 0.82465057]
mean value: 0.8329550488051706
key: test_fscore
value: [0.84313725 0.75268817 0.76404494 0.79166667 0.85057471 0.8
0.78723404 0.78350515 0.8045977 0.77083333]
mean value: 0.7948281981750667
key: train_fscore
value: [0.8452381 0.85406699 0.84513806 0.84921592 0.83832335 0.84033613
0.83956574 0.8371532 0.83932854 0.83764706]
mean value: 0.8426013081125754
key: test_precision
value: [0.74137931 0.71428571 0.75555556 0.73076923 0.84090909 0.76595745
0.7254902 0.71698113 0.81395349 0.71153846]
mean value: 0.7516819626737388
key: train_precision
value: [0.79418345 0.80586907 0.8 0.80733945 0.79365079 0.79726651
0.8 0.79587156 0.79365079 0.77899344]
mean value: 0.7966825066413111
key: test_recall
value: [0.97727273 0.79545455 0.77272727 0.86363636 0.86046512 0.8372093
0.86046512 0.86363636 0.79545455 0.84090909]
mean value: 0.846723044397463
key: train_recall
value: [0.90330789 0.90839695 0.8956743 0.8956743 0.88832487 0.88832487
0.88324873 0.88295165 0.89058524 0.90585242]
mean value: 0.8942341225248963
key: test_roc_auc
value: [0.81818182 0.73863636 0.76136364 0.77272727 0.8506871 0.79360465
0.77114165 0.75739958 0.80470402 0.74603594]
mean value: 0.7814482029598309
key: train_roc_auc
value: [0.8346056 0.84478372 0.83587786 0.84096692 0.82838636 0.83093088
0.83093734 0.82853166 0.82981039 0.82475362]
mean value: 0.8329584350499218
key: test_jcc
value: [0.72881356 0.60344828 0.61818182 0.65517241 0.74 0.66666667
0.64912281 0.6440678 0.67307692 0.62711864]
mean value: 0.6605668904598124
key: train_jcc
value: [0.73195876 0.74530271 0.73180873 0.73794549 0.72164948 0.72463768
0.72349272 0.71991701 0.7231405 0.72064777]
mean value: 0.7280500872128758
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.06535935 2.9703083 3.16897154 1.63516951 3.05120897 2.99582481
3.054533 3.10651636 3.26628566 2.91335678]
mean value: 2.822753429412842
key: score_time
value: [0.01267791 0.01468849 0.01453876 0.0129478 0.014781 0.01487565
0.01514196 0.01463127 0.01515675 0.01527548]
mean value: 0.01447150707244873
key: test_mcc
value: [0.68070616 0.5 0.43463356 0.52394654 0.70137421 0.56484984
0.51744186 0.47145877 0.54198427 0.58821234]
mean value: 0.5524607560393185
key: train_mcc
value: [0.87500374 0.95173097 0.9417228 0.87295442 0.93960096 0.94162706
0.93647031 0.97972048 0.91900117 0.93920448]
mean value: 0.9297036393442298
key: test_accuracy
value: [0.82954545 0.75 0.71590909 0.76136364 0.85057471 0.7816092
0.75862069 0.73563218 0.77011494 0.79310345]
mean value: 0.7746473354231975
key: train_accuracy
value: [0.9351145 0.97582697 0.97073791 0.93638677 0.96950445 0.9707751
0.9682338 0.98983482 0.95806861 0.96950445]
mean value: 0.9643987377582923
key: test_fscore
value: [0.84848485 0.75 0.69879518 0.75294118 0.85057471 0.78651685
0.75862069 0.73563218 0.7826087 0.80434783]
mean value: 0.7768522167556939
key: train_fscore
value: [0.93833132 0.97567222 0.97039897 0.93573265 0.97007481 0.9706258
0.96831432 0.98987342 0.95960832 0.9697733 ]
mean value: 0.9648405125048765
key: test_precision
value: [0.76363636 0.75 0.74358974 0.7804878 0.84090909 0.76086957
0.75 0.74418605 0.75 0.77083333]
mean value: 0.76545119480756
key: train_precision
value: [0.89400922 0.98195876 0.98177083 0.94545455 0.95343137 0.97686375
0.96708861 0.98488665 0.9245283 0.96009975]
mean value: 0.9570091794005952
key: test_recall
value: [0.95454545 0.75 0.65909091 0.72727273 0.86046512 0.81395349
0.76744186 0.72727273 0.81818182 0.84090909]
mean value: 0.7919133192389006
key: train_recall
value: [0.98727735 0.96946565 0.95928753 0.92620865 0.98730964 0.96446701
0.96954315 0.99491094 0.99745547 0.97964377]
mean value: 0.9735569160822
key: test_roc_auc
value: [0.82954545 0.75 0.71590909 0.76136364 0.8506871 0.78197674
0.75872093 0.73572939 0.76955603 0.79254757]
mean value: 0.7746035940803383
key: train_roc_auc
value: [0.9351145 0.97582697 0.97073791 0.93638677 0.96948179 0.97078312
0.96823213 0.98984126 0.9581186 0.96951731]
mean value: 0.9644040376641996
key: test_jcc
value: [0.73684211 0.6 0.53703704 0.60377358 0.74 0.64814815
0.61111111 0.58181818 0.64285714 0.67272727]
mean value: 0.6374314583867712
key: train_jcc
value: [0.88382688 0.9525 0.9425 0.87922705 0.94188862 0.94292804
0.93857494 0.97994987 0.92235294 0.94132029]
mean value: 0.9325068639804781
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.06067634 0.05294609 0.04533696 0.05182934 0.04592848 0.05230975
0.04506969 0.04694581 0.05104733 0.05060101]
mean value: 0.05026907920837402
key: score_time
value: [0.00971699 0.00921226 0.00920987 0.00925636 0.00938106 0.00934958
0.00928903 0.00924182 0.00931191 0.00933528]
mean value: 0.009330415725708007
key: test_mcc
value: [0.86452993 0.68181818 0.75174939 0.79730996 0.72410148 0.4957562
0.65520898 0.65641902 0.84118687 0.79323121]
mean value: 0.7261311211803974
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93181818 0.84090909 0.875 0.89772727 0.86206897 0.74712644
0.82758621 0.82758621 0.91954023 0.89655172]
mean value: 0.8625914315569487
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93023256 0.84090909 0.87912088 0.9010989 0.86046512 0.73170732
0.82352941 0.83516484 0.91764706 0.8988764 ]
mean value: 0.8618751572868099
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95238095 0.84090909 0.85106383 0.87234043 0.86046512 0.76923077
0.83333333 0.80851064 0.95121951 0.88888889]
mean value: 0.8628342556834248
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.84090909 0.90909091 0.93181818 0.86046512 0.69767442
0.81395349 0.86363636 0.88636364 0.90909091]
mean value: 0.8622093023255814
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93181818 0.84090909 0.875 0.89772727 0.86205074 0.74656448
0.82743129 0.82716702 0.919926 0.89640592]
mean value: 0.8625
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.86956522 0.7254902 0.78431373 0.82 0.75510204 0.57692308
0.7 0.71698113 0.84782609 0.81632653]
mean value: 0.7612528006343573
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.53
Accuracy on Blind test: 0.77
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.17999911 0.17724442 0.17448497 0.17866874 0.17901969 0.17455459
0.1772244 0.1795907 0.18107581 0.17981648]
mean value: 0.17816789150238038
key: score_time
value: [0.02069139 0.01987624 0.02035975 0.01991272 0.01992059 0.0199163
0.02066851 0.019912 0.02007174 0.02016401]
mean value: 0.02014932632446289
key: test_mcc
value: [0.73029674 0.56950711 0.54772256 0.63702206 0.63213531 0.58908039
0.61090601 0.54198427 0.61371748 0.59116498]
mean value: 0.6063536913398097
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.86363636 0.78409091 0.77272727 0.81818182 0.81609195 0.79310345
0.8045977 0.77011494 0.8045977 0.79310345]
mean value: 0.8020245559038662
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.86956522 0.79120879 0.76190476 0.82222222 0.81395349 0.8
0.80898876 0.7826087 0.79518072 0.80851064]
mean value: 0.8054143301985729
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.83333333 0.76595745 0.8 0.80434783 0.81395349 0.76595745
0.7826087 0.75 0.84615385 0.76 ]
mean value: 0.7922312083215425
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.81818182 0.72727273 0.84090909 0.81395349 0.8372093
0.8372093 0.81818182 0.75 0.86363636]
mean value: 0.8215644820295983
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86363636 0.78409091 0.77272727 0.81818182 0.81606765 0.79360465
0.80496829 0.76955603 0.80523256 0.7922833 ]
mean value: 0.8020348837209302
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.76923077 0.65454545 0.61538462 0.69811321 0.68627451 0.66666667
0.67924528 0.64285714 0.66 0.67857143]
mean value: 0.6750889077626037
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.29
Accuracy on Blind test: 0.66
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01222849 0.01173115 0.01190543 0.01236916 0.01223779 0.01243281
0.01243258 0.01199484 0.01241732 0.01229382]
mean value: 0.012204337120056152
key: score_time
value: [0.00952125 0.00915074 0.00955224 0.00985074 0.00988984 0.00982094
0.0091567 0.00913739 0.00936317 0.00908303]
mean value: 0.009452605247497558
key: test_mcc
value: [0.43192975 0.41294832 0.36363636 0.45454545 0.29237545 0.31434142
0.33456898 0.14917898 0.33315711 0.31094663]
mean value: 0.3397628443637556
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71590909 0.70454545 0.68181818 0.72727273 0.64367816 0.65517241
0.66666667 0.57471264 0.66666667 0.65517241]
mean value: 0.6691614420062696
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.71264368 0.72340426 0.68181818 0.72727273 0.66666667 0.67391304
0.6741573 0.60215054 0.6741573 0.65116279]
mean value: 0.6787346487789561
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72093023 0.68 0.68181818 0.72727273 0.62 0.63265306
0.65217391 0.57142857 0.66666667 0.66666667]
mean value: 0.6619610020678921
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.70454545 0.77272727 0.68181818 0.72727273 0.72093023 0.72093023
0.69767442 0.63636364 0.68181818 0.63636364]
mean value: 0.6980443974630021
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.71590909 0.70454545 0.68181818 0.72727273 0.64455603 0.65591966
0.66701903 0.57399577 0.66649049 0.65539112]
mean value: 0.669291754756871
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.55357143 0.56666667 0.51724138 0.57142857 0.5 0.50819672
0.50847458 0.43076923 0.50847458 0.48275862]
mean value: 0.5147581771289745
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.08
Accuracy on Blind test: 0.55
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.77516341 2.72674942 2.75992584 2.77896261 2.67283797 2.68284583
2.66170597 2.65351701 2.66825771 2.68729377]
mean value: 2.7067259550094604
key: score_time
value: [0.10390854 0.10599661 0.09895778 0.10612893 0.10672951 0.10571718
0.10673928 0.10393858 0.09965968 0.10147977]
mean value: 0.10392558574676514
key: test_mcc
value: [0.86722738 0.79730996 0.72802521 0.75019377 0.72746922 0.81702814
0.84118687 0.77077916 0.77359882 0.86289151]
mean value: 0.7935710044788394
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93181818 0.89772727 0.86363636 0.875 0.86206897 0.90804598
0.91954023 0.88505747 0.88505747 0.93103448]
mean value: 0.8958986415882968
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93478261 0.9010989 0.86046512 0.87640449 0.86666667 0.90909091
0.92134831 0.88888889 0.88095238 0.93333333]
mean value: 0.8973031613994567
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89583333 0.87234043 0.88095238 0.86666667 0.82978723 0.88888889
0.89130435 0.86956522 0.925 0.91304348]
mean value: 0.8833381972893999
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.97727273 0.93181818 0.84090909 0.88636364 0.90697674 0.93023256
0.95348837 0.90909091 0.84090909 0.95454545]
mean value: 0.9131606765327696
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93181818 0.89772727 0.86363636 0.875 0.86257928 0.9082981
0.919926 0.88477801 0.88557082 0.9307611 ]
mean value: 0.8960095137420718
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.87755102 0.82 0.75510204 0.78 0.76470588 0.83333333
0.85416667 0.8 0.78723404 0.875 ]
mean value: 0.8147092986130622
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.63
Accuracy on Blind test: 0.82
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.05537844 1.15527582 1.09323168 1.14960814 1.12652636 1.13774252
1.10354066 1.09963131 1.08902621 1.08362079]
mean value: 1.1093581914901733
key: score_time
value: [0.27662992 0.25080609 0.23210335 0.27736449 0.28904748 0.29127479
0.25754666 0.27043676 0.2904017 0.28125048]
mean value: 0.27168617248535154
key: test_mcc
value: [0.88843109 0.75174939 0.70472748 0.79566006 0.74735729 0.83932347
0.84118687 0.77077916 0.77102073 0.86289151]
mean value: 0.7973127061523191
key: train_mcc
value: [0.90915072 0.91129852 0.91627083 0.91638957 0.91910275 0.91119668
0.92405699 0.9217261 0.91129545 0.91402422]
mean value: 0.9154511824913257
key: test_accuracy
value: [0.94318182 0.875 0.85227273 0.89772727 0.87356322 0.91954023
0.91954023 0.88505747 0.88505747 0.93103448]
mean value: 0.8981974921630094
key: train_accuracy
value: [0.95419847 0.95547074 0.95801527 0.95801527 0.95933926 0.95552732
0.96188056 0.96060991 0.95552732 0.95679797]
mean value: 0.957538208353945
key: test_fscore
value: [0.94505495 0.87912088 0.85057471 0.8988764 0.87356322 0.91954023
0.92134831 0.88888889 0.88372093 0.93333333]
mean value: 0.8994021856651269
key: train_fscore
value: [0.95511222 0.95608532 0.95849057 0.95859473 0.96 0.95597484
0.96240602 0.9612015 0.95597484 0.95739348]
mean value: 0.9581233521836118
key: test_precision
value: [0.91489362 0.85106383 0.86046512 0.88888889 0.86363636 0.90909091
0.89130435 0.86956522 0.9047619 0.91304348]
mean value: 0.8866713672943908
key: train_precision
value: [0.93643032 0.94306931 0.94776119 0.94554455 0.94581281 0.94763092
0.95049505 0.94581281 0.94527363 0.94320988]
mean value: 0.945104046961017
key: test_recall
value: [0.97727273 0.90909091 0.84090909 0.90909091 0.88372093 0.93023256
0.95348837 0.90909091 0.86363636 0.95454545]
mean value: 0.913107822410148
key: train_recall
value: [0.97455471 0.96946565 0.96946565 0.97201018 0.97461929 0.96446701
0.97461929 0.97709924 0.96692112 0.97201018]
mean value: 0.9715232301313597
key: test_roc_auc
value: [0.94318182 0.875 0.85227273 0.89772727 0.87367865 0.91966173
0.919926 0.88477801 0.88530655 0.9307611 ]
mean value: 0.8982293868921776
key: train_roc_auc
value: [0.95419847 0.95547074 0.95801527 0.95801527 0.95931982 0.95551595
0.96186435 0.96063084 0.95554178 0.95681727]
mean value: 0.9575389752134433
key: test_jcc
value: [0.89583333 0.78431373 0.74 0.81632653 0.7755102 0.85106383
0.85416667 0.8 0.79166667 0.875 ]
mean value: 0.8183880956637974
key: train_jcc
value: [0.91408115 0.91586538 0.92028986 0.92048193 0.92307692 0.91566265
0.92753623 0.9253012 0.91566265 0.91826923]
mean value: 0.9196227204737726
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02685237 0.01579261 0.01570487 0.01578689 0.01585865 0.01588416
0.01590824 0.0158875 0.01584315 0.01586103]
mean value: 0.016937947273254393
key: score_time
value: [0.01227856 0.01211953 0.01234627 0.01220894 0.01220393 0.01216817
0.01225019 0.01219034 0.01222849 0.01226616]
mean value: 0.012226057052612305
key: test_mcc
value: [0.68252363 0.36514837 0.34530694 0.52613536 0.61090601 0.3853797
0.52126134 0.40330006 0.54295079 0.24714945]
mean value: 0.46300616567387165
key: train_mcc
value: [0.47428882 0.51540005 0.50501003 0.4893689 0.47802164 0.49540494
0.48842804 0.50058709 0.49168322 0.49679032]
mean value: 0.4934983053216603
key: test_accuracy
value: [0.84090909 0.68181818 0.67045455 0.76136364 0.8045977 0.68965517
0.75862069 0.70114943 0.77011494 0.62068966]
mean value: 0.729937304075235
key: train_accuracy
value: [0.73536896 0.75699746 0.7519084 0.74300254 0.73824651 0.74714104
0.7433291 0.74968234 0.74459975 0.74714104]
mean value: 0.7457417124972922
key: test_fscore
value: [0.84444444 0.69565217 0.69473684 0.77419355 0.80898876 0.70967742
0.76923077 0.7173913 0.76190476 0.66666667]
mean value: 0.7442886694399654
key: train_fscore
value: [0.75059952 0.76564417 0.7601476 0.75721154 0.74878049 0.75582822
0.75425791 0.75768758 0.75636364 0.75878788]
mean value: 0.7565308540334024
key: test_precision
value: [0.82608696 0.66666667 0.64705882 0.73469388 0.7826087 0.66
0.72916667 0.6875 0.8 0.6 ]
mean value: 0.7133781686587679
key: train_precision
value: [0.70975057 0.73933649 0.73571429 0.71753986 0.72065728 0.73159145
0.72429907 0.73333333 0.72222222 0.72453704]
mean value: 0.725898159276402
key: test_recall
value: [0.86363636 0.72727273 0.75 0.81818182 0.8372093 0.76744186
0.81395349 0.75 0.72727273 0.75 ]
mean value: 0.7804968287526427
key: train_recall
value: [0.79643766 0.79389313 0.78625954 0.80152672 0.77918782 0.78172589
0.78680203 0.78371501 0.79389313 0.79643766]
mean value: 0.7899878585913382
key: test_roc_auc
value: [0.84090909 0.68181818 0.67045455 0.76136364 0.80496829 0.69053911
0.75924947 0.7005814 0.77061311 0.61918605]
mean value: 0.7299682875264271
key: train_roc_auc
value: [0.73536896 0.75699746 0.7519084 0.74300254 0.73819442 0.74709704
0.74327379 0.74972553 0.7446623 0.7472036 ]
mean value: 0.7457434029526873
key: test_jcc
value: [0.73076923 0.53333333 0.53225806 0.63157895 0.67924528 0.55
0.625 0.55932203 0.61538462 0.5 ]
mean value: 0.5956891508288903
key: train_jcc
value: [0.60076775 0.62027833 0.61309524 0.60928433 0.59844055 0.60749507
0.60546875 0.60990099 0.60818713 0.61132812]
mean value: 0.6084246269566757
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.16049242 0.13136196 0.14000368 0.14280915 0.29893279 0.13998246
0.15176749 0.14539051 0.1494689 0.14014292]
mean value: 0.16003522872924805
key: score_time
value: [0.01117945 0.0121758 0.01154327 0.01159501 0.01143241 0.01127696
0.01138067 0.01289082 0.01135778 0.01130652]
mean value: 0.011613869667053222
key: test_mcc
value: [0.90909091 0.77594029 0.77272727 0.81902836 0.77008457 0.77312462
0.84118687 0.77008457 0.81972843 0.81935269]
mean value: 0.8070348585820003
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95454545 0.88636364 0.88636364 0.90909091 0.88505747 0.88505747
0.91954023 0.88505747 0.90804598 0.90804598]
mean value: 0.9027168234064786
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95454545 0.89130435 0.88636364 0.91111111 0.88372093 0.87804878
0.92134831 0.88636364 0.9047619 0.91304348]
mean value: 0.9030611594559804
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95454545 0.85416667 0.88636364 0.89130435 0.88372093 0.92307692
0.89130435 0.88636364 0.95 0.875 ]
mean value: 0.8995845942901048
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.93181818 0.88636364 0.93181818 0.88372093 0.8372093
0.95348837 0.88636364 0.86363636 0.95454545]
mean value: 0.9083509513742072
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.88636364 0.88636364 0.90909091 0.88504228 0.88451374
0.919926 0.88504228 0.90856237 0.90750529]
mean value: 0.9026955602536998
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.91304348 0.80392157 0.79591837 0.83673469 0.79166667 0.7826087
0.85416667 0.79591837 0.82608696 0.84 ]
mean value: 0.8240065460966995
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.06331563 0.08405757 0.09509635 0.07134914 0.07386637 0.05402017
0.08309579 0.09428 0.09064746 0.09419298]
mean value: 0.08039214611053466
key: score_time
value: [0.02636814 0.01936841 0.02469587 0.01357269 0.01332116 0.01356101
0.02252293 0.02556062 0.02522826 0.02603841]
mean value: 0.02102375030517578
key: test_mcc
value: [0.66759342 0.63900965 0.38888266 0.52286233 0.69052856 0.65994555
0.60331932 0.61648587 0.56319416 0.5404983 ]
mean value: 0.5892319806067103
key: train_mcc
value: [0.77627846 0.78530555 0.77151353 0.79550432 0.79120432 0.79390425
0.74423479 0.79970843 0.79223524 0.77729425]
mean value: 0.7827183127675109
key: test_accuracy
value: [0.82954545 0.81818182 0.69318182 0.76136364 0.83908046 0.82758621
0.79310345 0.8045977 0.7816092 0.77011494]
mean value: 0.7918364681295715
key: train_accuracy
value: [0.88676845 0.89185751 0.88422392 0.89694656 0.89453621 0.89580686
0.8703939 0.89834816 0.89453621 0.88691233]
mean value: 0.8900330109831841
key: test_fscore
value: [0.84210526 0.82608696 0.70967742 0.76404494 0.85106383 0.83516484
0.8125 0.82105263 0.78651685 0.77777778]
mean value: 0.8025990511096076
key: train_fscore
value: [0.89133089 0.89519112 0.88915956 0.9001233 0.89840881 0.8997555
0.87651332 0.90243902 0.89890378 0.89185905]
mean value: 0.8943684363139492
key: test_precision
value: [0.78431373 0.79166667 0.67346939 0.75555556 0.78431373 0.79166667
0.73584906 0.76470588 0.77777778 0.76086957]
mean value: 0.7620188009576266
key: train_precision
value: [0.85680751 0.86842105 0.85280374 0.87320574 0.86761229 0.86792453
0.83796296 0.86651054 0.86214953 0.85348837]
mean value: 0.8606886272167267
key: test_recall
value: [0.90909091 0.86363636 0.75 0.77272727 0.93023256 0.88372093
0.90697674 0.88636364 0.79545455 0.79545455]
mean value: 0.8493657505285412
key: train_recall
value: [0.92875318 0.92366412 0.92875318 0.92875318 0.93147208 0.93401015
0.91878173 0.94147583 0.9389313 0.93384224]
mean value: 0.9308436987380685
key: test_roc_auc
value: [0.82954545 0.81818182 0.69318182 0.76136364 0.84011628 0.8282241
0.79439746 0.80364693 0.7814482 0.7698203 ]
mean value: 0.7919926004228329
key: train_roc_auc
value: [0.88676845 0.89185751 0.88422392 0.89694656 0.89448922 0.89575826
0.87033234 0.89840289 0.89459255 0.88697188]
mean value: 0.8900343576032342
key: test_jcc
value: [0.72727273 0.7037037 0.55 0.61818182 0.74074074 0.71698113
0.68421053 0.69642857 0.64814815 0.63636364]
mean value: 0.6722031004230606
key: train_jcc
value: [0.80396476 0.81026786 0.8004386 0.81838565 0.81555556 0.81777778
0.78017241 0.82222222 0.81637168 0.80482456]
mean value: 0.8089981073735648
MCC on Blind test: 0.33
Accuracy on Blind test: 0.68
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02132583 0.01521826 0.01540136 0.01523566 0.01541209 0.01545763
0.01539135 0.01569581 0.01536846 0.01550746]
mean value: 0.016001391410827636
key: score_time
value: [0.02263594 0.01224017 0.01216555 0.01229405 0.01220536 0.01214266
0.01215577 0.01224852 0.01231098 0.01221323]
mean value: 0.013261222839355468
key: test_mcc
value: [0.50847518 0.40951418 0.3640126 0.60092521 0.58699109 0.51744186
0.46196713 0.24188306 0.42547569 0.40794313]
mean value: 0.45246291247679393
key: train_mcc
value: [0.47250952 0.48272595 0.47741223 0.47563022 0.4743098 0.47161961
0.48354281 0.48303054 0.48343926 0.47368777]
mean value: 0.47779077011814536
key: test_accuracy
value: [0.75 0.70454545 0.68181818 0.79545455 0.79310345 0.75862069
0.72413793 0.62068966 0.71264368 0.70114943]
mean value: 0.7242163009404389
key: train_accuracy
value: [0.73536896 0.74045802 0.73791349 0.73664122 0.73570521 0.73443456
0.7407878 0.7407878 0.7407878 0.73570521]
mean value: 0.7378590065666314
key: test_fscore
value: [0.77083333 0.71111111 0.68888889 0.8125 0.79545455 0.75862069
0.75 0.64516129 0.71264368 0.72916667]
mean value: 0.7374380203593218
key: train_fscore
value: [0.74634146 0.75121951 0.74816626 0.74909091 0.75 0.74849579
0.75242718 0.75 0.75121951 0.74757282]
mean value: 0.7494533444271472
key: test_precision
value: [0.71153846 0.69565217 0.67391304 0.75 0.77777778 0.75
0.67924528 0.6122449 0.72093023 0.67307692]
mean value: 0.7044378793320658
key: train_precision
value: [0.71662763 0.72131148 0.72 0.71527778 0.71232877 0.71167048
0.72093023 0.72340426 0.72131148 0.71461717]
mean value: 0.7177479268181197
key: test_recall
value: [0.84090909 0.72727273 0.70454545 0.88636364 0.81395349 0.76744186
0.8372093 0.68181818 0.70454545 0.79545455]
mean value: 0.7759513742071882
key: train_recall
value: [0.77862595 0.78371501 0.77862595 0.78625954 0.79187817 0.7893401
0.78680203 0.77862595 0.78371501 0.78371501]
mean value: 0.784130274731662
key: test_roc_auc
value: [0.75 0.70454545 0.68181818 0.79545455 0.79334038 0.75872093
0.72542283 0.61997886 0.71273784 0.70005285]
mean value: 0.7242071881606765
key: train_roc_auc
value: [0.73536896 0.74045802 0.73791349 0.73664122 0.73563374 0.73436471
0.74072926 0.74083582 0.74084228 0.73576614]
mean value: 0.7378553622402191
key: test_jcc
value: [0.62711864 0.55172414 0.52542373 0.68421053 0.66037736 0.61111111
0.6 0.47619048 0.55357143 0.57377049]
mean value: 0.5863497903295041
key: train_jcc
value: [0.59533074 0.6015625 0.59765625 0.59883721 0.6 0.59807692
0.60311284 0.6 0.6015625 0.59689922]
mean value: 0.5993038186951987
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02958322 0.02433634 0.02579641 0.0442853 0.03078198 0.02677584
0.02552652 0.02910757 0.02560163 0.02800775]
mean value: 0.028980255126953125
key: score_time
value: [0.01246619 0.01228714 0.01220798 0.01227021 0.01223183 0.01228809
0.01225042 0.01228547 0.01230955 0.0122788 ]
mean value: 0.012287569046020509
key: test_mcc
value: [0.63636364 0.3796283 0.33241884 0.53674504 0.75739672 0.56342495
0.59245365 0.35625628 0.61774328 0.15163988]
mean value: 0.4924070576625966
key: train_mcc
value: [0.70851405 0.46547729 0.45428788 0.57137778 0.72539042 0.71336542
0.67528879 0.31755015 0.6011329 0.16168478]
mean value: 0.5394069469480227
key: test_accuracy
value: [0.81818182 0.65909091 0.625 0.75 0.87356322 0.7816092
0.79310345 0.6091954 0.79310345 0.51724138]
mean value: 0.7220088819226751
key: train_accuracy
value: [0.8524173 0.68956743 0.67430025 0.75826972 0.86022872 0.85641677
0.83735705 0.59593393 0.77382465 0.52604828]
mean value: 0.742436411017456
key: test_fscore
value: [0.81818182 0.53125 0.44067797 0.69444444 0.88172043 0.7816092
0.80434783 0.37037037 0.82352941 0.08695652]
mean value: 0.6233087984198946
key: train_fscore
value: [0.84450402 0.56272401 0.52059925 0.69255663 0.86810552 0.8538163
0.83419689 0.32627119 0.81223629 0.0968523 ]
mean value: 0.6411862401536421
key: test_precision
value: [0.81818182 0.85 0.86666667 0.89285714 0.82 0.77272727
0.75510204 1. 0.72413793 1. ]
mean value: 0.849967287228371
key: train_precision
value: [0.89235127 0.95151515 0.9858156 0.95111111 0.82272727 0.8707124
0.85185185 0.97468354 0.69369369 1. ]
mean value: 0.8994461903882702
key: test_recall
value: [0.81818182 0.38636364 0.29545455 0.56818182 0.95348837 0.79069767
0.86046512 0.22727273 0.95454545 0.04545455]
mean value: 0.5900105708245243
key: train_recall
value: [0.80152672 0.39949109 0.35368957 0.54452926 0.91878173 0.83756345
0.81725888 0.19592875 0.97964377 0.05089059]
mean value: 0.589930380646078
key: test_roc_auc
value: [0.81818182 0.65909091 0.625 0.75 0.87447146 0.78171247
0.79386892 0.61363636 0.79122622 0.52272727]
mean value: 0.7229915433403805
key: train_roc_auc
value: [0.8524173 0.68956743 0.67430025 0.75826972 0.86015422 0.85644076
0.83738262 0.59542631 0.77408584 0.52544529]
mean value: 0.7423489750842794
key: test_jcc
value: [0.69230769 0.36170213 0.2826087 0.53191489 0.78846154 0.64150943
0.67272727 0.22727273 0.7 0.04545455]
mean value: 0.494395892711481
key: train_jcc
value: [0.73085847 0.3915212 0.35189873 0.52970297 0.76694915 0.74492099
0.71555556 0.19493671 0.68383659 0.05089059]
mean value: 0.5161070955285676
MCC on Blind test: 0.45
Accuracy on Blind test: 0.66
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02404928 0.03382921 0.04285789 0.03751874 0.03870797 0.02889562
0.03123188 0.03173637 0.03078961 0.03123069]
mean value: 0.03308472633361816
key: score_time
value: [0.01227283 0.01235342 0.01227856 0.01233625 0.01231527 0.01219463
0.01230526 0.02353597 0.0126853 0.01253462]
mean value: 0.013481211662292481
key: test_mcc
value: [0.70472748 0.5933661 0.46225016 0.57188626 0.65539112 0.19116707
0.61789034 0.5504913 0.63213531 0.64236223]
mean value: 0.5621667377709061
key: train_mcc
value: [0.72552928 0.76845498 0.73666335 0.75203977 0.74647425 0.20313543
0.74500921 0.68146897 0.70781234 0.7233087 ]
mean value: 0.6789896271218194
key: test_accuracy
value: [0.85227273 0.79545455 0.72727273 0.78409091 0.82758621 0.54022989
0.8045977 0.77011494 0.81609195 0.81609195]
mean value: 0.773380355276907
key: train_accuracy
value: [0.86132316 0.88295165 0.86386768 0.87531807 0.8729352 0.54129606
0.8703939 0.8360864 0.85387548 0.85387548]
mean value: 0.8311923075679538
key: test_fscore
value: [0.85393258 0.80434783 0.75 0.79569892 0.82758621 0.13043478
0.8172043 0.75 0.81818182 0.83333333]
mean value: 0.738071977718347
key: train_fscore
value: [0.85486019 0.88753056 0.87367178 0.87901235 0.87562189 0.15850816
0.87710843 0.82108183 0.85461441 0.86735871]
mean value: 0.7949368311113626
key: test_precision
value: [0.84444444 0.77083333 0.69230769 0.75510204 0.81818182 1.
0.76 0.83333333 0.81818182 0.76923077]
mean value: 0.8061615249829536
key: train_precision
value: [0.89664804 0.85411765 0.81497797 0.85371703 0.85853659 0.97142857
0.83486239 0.90243902 0.84924623 0.79324895]
mean value: 0.8629222434507968
key: test_recall
value: [0.86363636 0.84090909 0.81818182 0.84090909 0.8372093 0.06976744
0.88372093 0.68181818 0.81818182 0.90909091]
mean value: 0.7563424947145877
key: train_recall
value: [0.81679389 0.92366412 0.94147583 0.90585242 0.89340102 0.08629442
0.92385787 0.75318066 0.86005089 0.956743 ]
mean value: 0.806131411374175
key: test_roc_auc
value: [0.85227273 0.79545455 0.72727273 0.78409091 0.82769556 0.53488372
0.80549683 0.77114165 0.81606765 0.81501057]
mean value: 0.772938689217759
key: train_roc_auc
value: [0.86132316 0.88295165 0.86386768 0.87531807 0.87290916 0.54187494
0.87032588 0.83598119 0.85388331 0.85400602]
mean value: 0.8312441068960618
key: test_jcc
value: [0.74509804 0.67272727 0.6 0.66071429 0.70588235 0.06976744
0.69090909 0.6 0.69230769 0.71428571]
mean value: 0.6151691889961384
key: train_jcc
value: [0.74651163 0.7978022 0.77568134 0.78414097 0.77876106 0.08607595
0.78111588 0.69647059 0.74613687 0.76578411]
mean value: 0.6958480595363976
MCC on Blind test: 0.47
Accuracy on Blind test: 0.68
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.27864838 0.27138972 0.2752707 0.27588749 0.2660768 0.27190518
0.27267504 0.27442336 0.26846004 0.26556134]
mean value: 0.27202980518341063
key: score_time
value: [0.01731539 0.01707101 0.01773667 0.01637697 0.01633453 0.01748466
0.01726389 0.01764894 0.01723289 0.01652169]
mean value: 0.017098665237426758
key: test_mcc
value: [0.86363636 0.75174939 0.73413035 0.81818182 0.81683533 0.7472238
0.7951307 0.72689655 0.79323121 0.81935269]
mean value: 0.7866368198854106
key: train_mcc
value: [0.88323763 0.89581641 0.90099965 0.90360046 0.90632277 0.89603807
0.87806138 0.91112948 0.91370369 0.89096032]
mean value: 0.8979869877722635
key: test_accuracy
value: [0.93181818 0.875 0.86363636 0.90909091 0.90804598 0.87356322
0.89655172 0.86206897 0.89655172 0.90804598]
mean value: 0.8924373040752351
key: train_accuracy
value: [0.94147583 0.94783715 0.95038168 0.95165394 0.95298602 0.94790343
0.93900889 0.95552732 0.95679797 0.94536213]
mean value: 0.9488934369250964
key: test_fscore
value: [0.93181818 0.87912088 0.87234043 0.90909091 0.9047619 0.87058824
0.8988764 0.86956522 0.8988764 0.91304348]
mean value: 0.8948082040258845
key: train_fscore
value: [0.94221106 0.9482976 0.9509434 0.95226131 0.95369212 0.94855709
0.93939394 0.95575221 0.95707071 0.94591195]
mean value: 0.9494091374838326
key: test_precision
value: [0.93181818 0.85106383 0.82 0.90909091 0.92682927 0.88095238
0.86956522 0.83333333 0.88888889 0.875 ]
mean value: 0.8786542009554915
key: train_precision
value: [0.93052109 0.94 0.94029851 0.94044665 0.94074074 0.93796526
0.93467337 0.94974874 0.94987469 0.93532338]
mean value: 0.939959243103895
key: test_recall
value: [0.93181818 0.90909091 0.93181818 0.90909091 0.88372093 0.86046512
0.93023256 0.90909091 0.90909091 0.95454545]
mean value: 0.9128964059196617
key: train_recall
value: [0.95419847 0.956743 0.96183206 0.96437659 0.96700508 0.95939086
0.94416244 0.96183206 0.96437659 0.956743 ]
mean value: 0.9590660156805001
key: test_roc_auc
value: [0.93181818 0.875 0.86363636 0.90909091 0.90776956 0.87341438
0.89693446 0.8615222 0.89640592 0.90750529]
mean value: 0.8923097251585623
key: train_roc_auc
value: [0.94147583 0.94783715 0.95038168 0.95165394 0.95296819 0.94788882
0.93900234 0.95553532 0.95680758 0.94537658]
mean value: 0.9488927422792266
key: test_jcc
value: [0.87234043 0.78431373 0.77358491 0.83333333 0.82608696 0.77083333
0.81632653 0.76923077 0.81632653 0.84 ]
mean value: 0.8102376510326154
key: train_jcc
value: [0.89073634 0.90167866 0.90647482 0.9088729 0.91148325 0.90214797
0.88571429 0.91525424 0.91767554 0.8973747 ]
mean value: 0.903741271535579
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.21593761 0.12754798 0.21600103 0.12391162 0.22773957 0.23310304
0.23577619 0.22948527 0.23245597 0.24050713]
mean value: 0.2082465410232544
key: score_time
value: [0.04016471 0.02497435 0.04214454 0.01842999 0.04276419 0.03724957
0.04361939 0.04263687 0.04333544 0.03482246]
mean value: 0.03701415061950684
key: test_mcc
value: [0.88659264 0.75174939 0.77352678 0.79566006 0.81702814 0.72410148
0.81683533 0.79480784 0.75240169 0.88524603]
mean value: 0.7997949390227214
key: train_mcc
value: [0.98735727 0.98728055 0.98221691 0.97717833 0.98988607 0.97478912
0.98476502 0.98737301 0.97971996 0.98228992]
mean value: 0.983285614378063
key: test_accuracy
value: [0.94318182 0.875 0.88636364 0.89772727 0.90804598 0.86206897
0.90804598 0.89655172 0.87356322 0.94252874]
mean value: 0.8993077324973877
key: train_accuracy
value: [0.99363868 0.99363868 0.99109415 0.98854962 0.99491741 0.98729352
0.99237611 0.99364676 0.98983482 0.99110546]
mean value: 0.9916095198373053
key: test_fscore
value: [0.94252874 0.87912088 0.88372093 0.8988764 0.90909091 0.86046512
0.9047619 0.9010989 0.86746988 0.94382022]
mean value: 0.8990953884947962
key: train_fscore
value: [0.99359795 0.99363057 0.99106003 0.98847631 0.99489796 0.98717949
0.99236641 0.99359795 0.98976982 0.99103713]
mean value: 0.9915613625330997
key: test_precision
value: [0.95348837 0.85106383 0.9047619 0.88888889 0.88888889 0.86046512
0.92682927 0.87234043 0.92307692 0.93333333]
mean value: 0.9003136950933864
key: train_precision
value: [1. 0.99489796 0.99487179 0.99484536 1. 0.99740933
0.99489796 1. 0.99485861 0.99742268]
mean value: 0.9969203692726318
key: test_recall
value: [0.93181818 0.90909091 0.86363636 0.90909091 0.93023256 0.86046512
0.88372093 0.93181818 0.81818182 0.95454545]
mean value: 0.8992600422832981
key: train_recall
value: [0.98727735 0.99236641 0.98727735 0.9821883 0.98984772 0.97715736
0.98984772 0.98727735 0.98473282 0.98473282]
mean value: 0.9862705209180972
key: test_roc_auc
value: [0.94318182 0.875 0.88636364 0.89772727 0.9082981 0.86205074
0.90776956 0.89614165 0.87420719 0.94238901]
mean value: 0.8993128964059196
key: train_roc_auc
value: [0.99363868 0.99363868 0.99109415 0.98854962 0.99492386 0.98730642
0.99237933 0.99363868 0.98982834 0.99109738]
mean value: 0.9916095116312111
key: test_jcc
value: [0.89130435 0.78431373 0.79166667 0.81632653 0.83333333 0.75510204
0.82608696 0.82 0.76595745 0.89361702]
mean value: 0.81777080693517
key: train_jcc
value: [0.98727735 0.98734177 0.98227848 0.97721519 0.98984772 0.97468354
0.98484848 0.98727735 0.97974684 0.9822335 ]
mean value: 0.9832750233286541
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.37668085 0.38118434 0.36029553 0.38846707 0.28293347 0.33020926
0.3172996 0.28086233 0.47087789 0.33524823]
mean value: 0.35240585803985597
key: score_time
value: [0.03350472 0.0394361 0.01947856 0.03539586 0.03546238 0.01960278
0.01966977 0.0196321 0.02036977 0.02100039]
mean value: 0.02635524272918701
key: test_mcc
value: [0.67332702 0.41079192 0.59090909 0.54601891 0.59245365 0.59717038
0.44896886 0.35707199 0.51744186 0.54198427]
mean value: 0.5276137954634224
key: train_mcc
value: [0.92486253 0.90935126 0.9146712 0.91009558 0.91454821 0.91260837
0.91502105 0.91530625 0.91478356 0.92189349]
mean value: 0.9153141494896055
key: test_accuracy
value: [0.82954545 0.70454545 0.79545455 0.77272727 0.79310345 0.79310345
0.72413793 0.67816092 0.75862069 0.77011494]
mean value: 0.7619514106583072
key: train_accuracy
value: [0.96183206 0.95419847 0.956743 0.95419847 0.95679797 0.95552732
0.95679797 0.95679797 0.95679797 0.96060991]
mean value: 0.9570301108018016
key: test_fscore
value: [0.84536082 0.7173913 0.79545455 0.77777778 0.80434783 0.80851064
0.72727273 0.69565217 0.75862069 0.7826087 ]
mean value: 0.7712997203200364
key: train_fscore
value: [0.96277916 0.95522388 0.95781638 0.95555556 0.95781638 0.9568434
0.95802469 0.95802469 0.95781638 0.96129838]
mean value: 0.9581198886944443
key: test_precision
value: [0.77358491 0.6875 0.79545455 0.76086957 0.75510204 0.74509804
0.71111111 0.66666667 0.76744186 0.75 ]
mean value: 0.7412828734607221
key: train_precision
value: [0.93946731 0.93430657 0.9346247 0.92805755 0.9368932 0.93045564
0.93269231 0.93045564 0.9346247 0.94362745]
mean value: 0.9345205063861101
key: test_recall
value: [0.93181818 0.75 0.79545455 0.79545455 0.86046512 0.88372093
0.74418605 0.72727273 0.75 0.81818182]
mean value: 0.8056553911205074
key: train_recall
value: [0.98727735 0.97709924 0.9821883 0.98473282 0.97969543 0.98477157
0.98477157 0.98727735 0.9821883 0.97964377]
mean value: 0.9829645703362136
key: test_roc_auc
value: [0.82954545 0.70454545 0.79545455 0.77272727 0.79386892 0.79413319
0.72436575 0.67758985 0.75872093 0.76955603]
mean value: 0.7620507399577168
key: train_roc_auc
value: [0.96183206 0.95419847 0.956743 0.95419847 0.95676884 0.95549011
0.95676238 0.95683665 0.95683019 0.96063407]
mean value: 0.9570294235414164
key: test_jcc
value: [0.73214286 0.55932203 0.66037736 0.63636364 0.67272727 0.67857143
0.57142857 0.53333333 0.61111111 0.64285714]
mean value: 0.6298234745924225
key: train_jcc
value: [0.92822967 0.91428571 0.91904762 0.91489362 0.91904762 0.91725768
0.91943128 0.91943128 0.91904762 0.92548077]
mean value: 0.9196152865209224
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.16928625 1.14868641 1.13823104 1.14943147 1.14699316 1.14772367
1.1377604 1.14932823 1.15342426 1.15133929]
mean value: 1.1492204189300537
key: score_time
value: [0.0095892 0.0095458 0.00968099 0.0096097 0.009974 0.0093739
0.00960636 0.00960088 0.01036811 0.00943041]
mean value: 0.009677934646606445
key: test_mcc
value: [0.90909091 0.75174939 0.84112635 0.82158384 0.7951307 0.83923862
0.90904296 0.79480784 0.81972843 0.84093745]
mean value: 0.8322436481865038
key: train_mcc
value: [0.94659247 0.9694782 0.964489 0.96951587 0.96447209 0.96453461
0.96203358 0.96961653 0.96443403 0.9542566 ]
mean value: 0.9629422979060673
key: test_accuracy
value: [0.95454545 0.875 0.92045455 0.90909091 0.89655172 0.91954023
0.95402299 0.89655172 0.90804598 0.91954023]
mean value: 0.9153343782654128
key: train_accuracy
value: [0.97328244 0.98473282 0.9821883 0.98473282 0.98221093 0.98221093
0.98094028 0.98475222 0.98221093 0.97712834]
mean value: 0.9814390008115335
key: test_fscore
value: [0.95454545 0.87912088 0.92134831 0.91304348 0.8988764 0.91764706
0.95454545 0.9010989 0.9047619 0.92307692]
mean value: 0.916806477333504
key: train_fscore
value: [0.97318008 0.98469388 0.98205128 0.98465473 0.98214286 0.98209719
0.98079385 0.98461538 0.98214286 0.97709924]
mean value: 0.9813471343964834
key: test_precision
value: [0.95454545 0.85106383 0.91111111 0.875 0.86956522 0.92857143
0.93333333 0.87234043 0.95 0.89361702]
mean value: 0.9039147821548377
key: train_precision
value: [0.97692308 0.98721228 0.98966408 0.98971722 0.98717949 0.98969072
0.98966408 0.99224806 0.98465473 0.97709924]
mean value: 0.986405298110647
key: test_recall
value: [0.95454545 0.90909091 0.93181818 0.95454545 0.93023256 0.90697674
0.97674419 0.93181818 0.86363636 0.95454545]
mean value: 0.9313953488372093
key: train_recall
value: [0.96946565 0.9821883 0.97455471 0.97964377 0.97715736 0.97461929
0.97208122 0.97709924 0.97964377 0.97709924]
mean value: 0.9763552524508854
key: test_roc_auc
value: [0.95454545 0.875 0.92045455 0.90909091 0.89693446 0.91939746
0.95428118 0.89614165 0.90856237 0.91913319]
mean value: 0.9153541226215645
key: train_roc_auc
value: [0.97328244 0.98473282 0.9821883 0.98473282 0.98221736 0.98222059
0.98095155 0.98474251 0.98220767 0.9771283 ]
mean value: 0.9814404360574004
key: test_jcc
value: [0.91304348 0.78431373 0.85416667 0.84 0.81632653 0.84782609
0.91304348 0.82 0.82608696 0.85714286]
mean value: 0.8471949779911965
key: train_jcc
value: [0.94776119 0.96984925 0.96473552 0.9697733 0.96491228 0.96482412
0.96231156 0.96969697 0.96491228 0.95522388]
mean value: 0.9634000346471366
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04250526 0.04080033 0.04067802 0.03934932 0.04005003 0.04622293
0.04257751 0.03894424 0.03844476 0.04671168]
mean value: 0.04162840843200684
key: score_time
value: [0.01281166 0.01308036 0.01333427 0.01313806 0.01291275 0.01286411
0.01297784 0.01300836 0.01292276 0.0129571 ]
mean value: 0.013000726699829102
key: test_mcc
value: [0.15811388 0.12598816 0.04908807 0.09016696 0.20790225 0.04655125
0.14533074 0.13018178 0.14830203 0.19262997]
mean value: 0.12942550981727555
key: train_mcc
value: [0.23759548 0.25503069 0.26889699 0.25503069 0.2316976 0.25233931
0.25800118 0.27907707 0.27376251 0.23713446]
mean value: 0.25485659746651074
key: test_accuracy
value: [0.54545455 0.53409091 0.51136364 0.52272727 0.55172414 0.50574713
0.52873563 0.54022989 0.55172414 0.56321839]
mean value: 0.5355015673981192
key: train_accuracy
value: [0.55343511 0.5610687 0.56743003 0.5610687 0.55146125 0.56035578
0.56289708 0.57179161 0.56925032 0.55273189]
mean value: 0.5611490473372972
key: test_fscore
value: [0.67741935 0.672 0.66141732 0.66666667 0.68292683 0.656
0.672 0.67741935 0.67768595 0.68852459]
mean value: 0.6732060069024183
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
key: train_fscore
value: [0.69129288 0.69496021 0.69804618 0.69496021 0.69062226 0.69488536
0.69611307 0.69991095 0.69866667 0.69068541]
mean value: 0.695014321097323
key: test_precision
value: [0.525 0.51851852 0.5060241 0.51219512 0.525 0.5
0.51219512 0.525 0.53246753 0.53846154]
mean value: 0.519486192973557
key: train_precision
value: [0.52822581 0.53252033 0.5361528 0.53252033 0.52744311 0.53243243
0.53387534 0.53835616 0.53688525 0.52751678]
mean value: 0.5325928319334771
key: test_recall
value: [0.95454545 0.95454545 0.95454545 0.95454545 0.97674419 0.95348837
0.97674419 0.95454545 0.93181818 0.95454545]
mean value: 0.9566067653276956
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.54545455 0.53409091 0.51136364 0.52272727 0.55655391 0.5108351
0.53382664 0.53541226 0.54730444 0.55866808]
mean value: 0.5356236786469345
key: train_roc_auc
value: [0.55343511 0.5610687 0.56743003 0.5610687 0.55089059 0.55979644
0.56234097 0.57233503 0.56979695 0.55329949]
mean value: 0.5611462006432363
key: test_jcc
value: [0.51219512 0.5060241 0.49411765 0.5 0.51851852 0.48809524
0.5060241 0.51219512 0.5125 0.525 ]
mean value: 0.5074669840346103
key: train_jcc
value: [0.52822581 0.53252033 0.5361528 0.53252033 0.52744311 0.53243243
0.53387534 0.53835616 0.53688525 0.52751678]
mean value: 0.5325928319334771
MCC on Blind test: 0.06
Accuracy on Blind test: 0.45
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02579308 0.04292941 0.04280329 0.04076314 0.04551339 0.04232264
0.0476377 0.042732 0.04306769 0.04305148]
mean value: 0.04166138172149658
key: score_time
value: [0.01925159 0.01919985 0.01918101 0.01926923 0.01917887 0.01919866
0.01924324 0.0191319 0.01925159 0.01928449]
mean value: 0.019219040870666504
key: test_mcc
value: [0.67332702 0.5933661 0.52394654 0.52394654 0.74735729 0.61371748
0.63065834 0.64863047 0.61028941 0.65641902]
mean value: 0.6221658218271912
key: train_mcc
value: [0.7518026 0.75042988 0.72860483 0.75891598 0.74972171 0.74736363
0.73478421 0.76235948 0.76162178 0.73804273]
mean value: 0.7483646825055856
key: test_accuracy
value: [0.82954545 0.79545455 0.76136364 0.76136364 0.87356322 0.8045977
0.8045977 0.81609195 0.8045977 0.82758621]
mean value: 0.8078761755485894
key: train_accuracy
value: [0.8740458 0.8740458 0.86259542 0.8778626 0.8729352 0.87166455
0.86531131 0.87928844 0.87928844 0.86658196]
mean value: 0.8723619503962288
key: test_fscore
value: [0.84536082 0.80434783 0.76923077 0.76923077 0.87356322 0.81318681
0.82474227 0.83673469 0.81318681 0.83516484]
mean value: 0.8184748831138817
key: train_fscore
value: [0.88 0.87882497 0.86893204 0.88321168 0.87922705 0.87816647
0.87228916 0.88484848 0.88428745 0.87364621]
mean value: 0.8783433511013907
key: test_precision
value: [0.77358491 0.77083333 0.74468085 0.74468085 0.86363636 0.77083333
0.74074074 0.75925926 0.78723404 0.80851064]
mean value: 0.7763994318942131
key: train_precision
value: [0.84027778 0.84669811 0.83062645 0.84615385 0.83870968 0.83678161
0.83027523 0.84490741 0.84813084 0.82876712]
mean value: 0.839132807504431
key: test_recall
value: [0.93181818 0.84090909 0.79545455 0.79545455 0.88372093 0.86046512
0.93023256 0.93181818 0.84090909 0.86363636]
mean value: 0.8674418604651163
key: train_recall
value: [0.92366412 0.91348601 0.91094148 0.92366412 0.92385787 0.92385787
0.91878173 0.92875318 0.92366412 0.92366412]
mean value: 0.9214334612056161
key: test_roc_auc
value: [0.82954545 0.79545455 0.76136364 0.76136364 0.87367865 0.80523256
0.80602537 0.8147463 0.80417548 0.82716702]
mean value: 0.8078752642706131
key: train_roc_auc
value: [0.8740458 0.8740458 0.86259542 0.8778626 0.87287041 0.87159815
0.86524328 0.87935121 0.87934475 0.8666544 ]
mean value: 0.8723611810749021
key: test_jcc
value: [0.73214286 0.67272727 0.625 0.625 0.7755102 0.68518519
0.70175439 0.71929825 0.68518519 0.71698113]
mean value: 0.6938784467976552
key: train_jcc
value: [0.78571429 0.78384279 0.76824034 0.79084967 0.78448276 0.7827957
0.77350427 0.79347826 0.79257642 0.77564103]
mean value: 0.7831125533798624
MCC on Blind test: 0.33
Accuracy on Blind test: 0.68
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.32069945 0.18340683 0.38573503 0.23420286 0.37261057 0.35421038
0.28985262 0.33406305 0.21048737 0.33007169]
mean value: 0.30153398513793944
key: score_time
value: [0.01916289 0.01916862 0.02289367 0.02675509 0.01928449 0.01241636
0.02552414 0.01919389 0.01500607 0.01925492]
mean value: 0.019866013526916505
key: test_mcc
value: [0.67332702 0.59648091 0.52394654 0.52394654 0.71089459 0.65994555
0.62350092 0.62173301 0.60920157 0.65641902]
mean value: 0.6199395673145706
key: train_mcc
value: [0.7518026 0.76814463 0.72860483 0.75891598 0.78234745 0.77861045
0.75483144 0.78239885 0.79733748 0.73804273]
mean value: 0.7641036458050515
key: test_accuracy
value: [0.82954545 0.79545455 0.76136364 0.76136364 0.85057471 0.82758621
0.8045977 0.8045977 0.8045977 0.82758621]
mean value: 0.806726750261233
key: train_accuracy
value: [0.8740458 0.88295165 0.86259542 0.8778626 0.88945362 0.88818297
0.87547649 0.88945362 0.89707751 0.86658196]
mean value: 0.8803681646087341
key: test_fscore
value: [0.84536082 0.80851064 0.76923077 0.76923077 0.86021505 0.83516484
0.82105263 0.82474227 0.80898876 0.83516484]
mean value: 0.8177661389259918
key: train_fscore
value: [0.88 0.8872549 0.86893204 0.88321168 0.89454545 0.89242054
0.88164251 0.89428919 0.90133983 0.87364621]
mean value: 0.8857282348915667
key: test_precision
value: [0.77358491 0.76 0.74468085 0.74468085 0.8 0.79166667
0.75 0.75471698 0.8 0.80851064]
mean value: 0.7727840893884651
key: train_precision
value: [0.84027778 0.85579196 0.83062645 0.84615385 0.85614849 0.86084906
0.84101382 0.85581395 0.86448598 0.82876712]
mean value: 0.8479928467674945
key: test_recall
value: [0.93181818 0.86363636 0.79545455 0.79545455 0.93023256 0.88372093
0.90697674 0.90909091 0.81818182 0.86363636]
mean value: 0.8698202959830866
key: train_recall
value: [0.92366412 0.92111959 0.91094148 0.92366412 0.93654822 0.92639594
0.92639594 0.93638677 0.94147583 0.92366412]
mean value: 0.9270256132057194
key: test_roc_auc
value: [0.82954545 0.79545455 0.76136364 0.76136364 0.85147992 0.8282241
0.8057611 0.80338266 0.80443975 0.82716702]
mean value: 0.8068181818181818
key: train_roc_auc
value: [0.8740458 0.88295165 0.86259542 0.8778626 0.8893937 0.88813436
0.87541171 0.88951318 0.89713385 0.8666544 ]
mean value: 0.8803696671445732
key: test_jcc
value: [0.73214286 0.67857143 0.625 0.625 0.75471698 0.71698113
0.69642857 0.70175439 0.67924528 0.71698113]
mean value: 0.6926821771409656
key: train_jcc
value: [0.78571429 0.79735683 0.76824034 0.79084967 0.80921053 0.80573951
0.78833693 0.80879121 0.82039911 0.77564103]
mean value: 0.7950279451682578
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04097724 0.08146095 0.0438447 0.04152274 0.04924703 0.0503757
0.0411675 0.04127216 0.04899478 0.04264069]
mean value: 0.04815034866333008
key: score_time
value: [0.0130403 0.02584267 0.01314807 0.0123024 0.01325655 0.0131979
0.01327658 0.01870227 0.01221871 0.01481318]
mean value: 0.014979863166809082
key: test_mcc
value: [0.70014004 0.54772256 0.36363636 0.52394654 0.70254862 0.49497627
0.52973328 0.52312769 0.58615222 0.60920157]
mean value: 0.5581185160759019
key: train_mcc
value: [0.68354893 0.665364 0.68555338 0.68310469 0.64825655 0.66820753
0.65258177 0.66788902 0.66561315 0.66056435]
mean value: 0.6680683364521826
key: test_accuracy
value: [0.84090909 0.77272727 0.68181818 0.76136364 0.85057471 0.74712644
0.75862069 0.75862069 0.79310345 0.8045977 ]
mean value: 0.7769461859979101
key: train_accuracy
value: [0.84096692 0.83206107 0.84223919 0.84096692 0.82337992 0.83354511
0.82592122 0.83354511 0.83227446 0.82846252]
mean value: 0.8333362432143192
key: test_fscore
value: [0.85714286 0.7826087 0.68181818 0.76923077 0.84337349 0.75
0.77894737 0.77894737 0.79545455 0.80898876]
mean value: 0.784651204416148
key: train_fscore
value: [0.84624846 0.83703704 0.84653465 0.84548826 0.82944785 0.83847102
0.83023544 0.83726708 0.83663366 0.83675937]
mean value: 0.8384122841516979
key: test_precision
value: [0.77777778 0.75 0.68181818 0.74468085 0.875 0.73333333
0.71153846 0.7254902 0.79545455 0.8 ]
mean value: 0.7595093347064561
key: train_precision
value: [0.81904762 0.81294964 0.82409639 0.82211538 0.80285036 0.81534772
0.81113801 0.81796117 0.81445783 0.79723502]
mean value: 0.8137199141553185
key: test_recall
value: [0.95454545 0.81818182 0.68181818 0.79545455 0.81395349 0.76744186
0.86046512 0.84090909 0.79545455 0.81818182]
mean value: 0.8146405919661733
key: train_recall
value: [0.87531807 0.86259542 0.87022901 0.87022901 0.85786802 0.86294416
0.85025381 0.85750636 0.86005089 0.88040712]
mean value: 0.8647401867710311
key: test_roc_auc
value: [0.84090909 0.77272727 0.68181818 0.76136364 0.85015856 0.74735729
0.75977801 0.75766385 0.79307611 0.80443975]
mean value: 0.7769291754756871
key: train_roc_auc
value: [0.84096692 0.83206107 0.84223919 0.84096692 0.82333605 0.8335077
0.82589026 0.83357552 0.83230971 0.82852844]
mean value: 0.8333381769804058
key: test_jcc
value: [0.75 0.64285714 0.51724138 0.625 0.72916667 0.6
0.63793103 0.63793103 0.66037736 0.67924528]
mean value: 0.6479749899309105
key: train_jcc
value: [0.73347548 0.71974522 0.73390558 0.73233405 0.70859539 0.72186837
0.70974576 0.72008547 0.71914894 0.71933472]
mean value: 0.7218238970505827
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.95160985 1.14174104 0.97420168 1.11319375 0.93467951 1.07230449
0.9751153 0.97105861 0.97270489 0.97742367]
mean value: 1.0084032773971559
key: score_time
value: [0.01518869 0.01511359 0.0150919 0.01260114 0.01492715 0.01479912
0.01499319 0.01532936 0.01512575 0.01502037]
mean value: 0.014819025993347168
key: test_mcc
value: [0.64236405 0.59152048 0.45501576 0.59152048 0.65696218 0.65696218
0.80389885 0.64236223 0.60940803 0.5641598 ]
mean value: 0.6214174050499405
key: train_mcc
value: [0.77708835 0.79694349 0.77893113 0.80712807 0.82524165 0.77226648
0.76473299 0.82318429 0.79021233 0.81771201]
mean value: 0.7953440786857605
key: test_accuracy
value: [0.81818182 0.79545455 0.72727273 0.79545455 0.82758621 0.82758621
0.89655172 0.81609195 0.8045977 0.7816092 ]
mean value: 0.8090386624869383
key: train_accuracy
value: [0.88804071 0.89821883 0.88931298 0.90330789 0.91232529 0.88564168
0.88182973 0.91105464 0.89453621 0.90851334]
mean value: 0.8972781296578303
key: test_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.82978723 0.8 0.72093023 0.8 0.83146067 0.83146067
0.90322581 0.83333333 0.8045977 0.79120879]
mean value: 0.8146004447058462
key: train_fscore
value: [0.89081886 0.9 0.89084065 0.905 0.91407223 0.88861386
0.88504326 0.91315136 0.89714994 0.91022444]
mean value: 0.8994914606531482
key: test_precision
value: [0.78 0.7826087 0.73809524 0.7826087 0.80434783 0.80434783
0.84 0.76923077 0.81395349 0.76595745]
mean value: 0.7881149985984872
key: train_precision
value: [0.86924939 0.88452088 0.87871287 0.88943489 0.89731051 0.86714976
0.8626506 0.89104116 0.87439614 0.89242054]
mean value: 0.8806886749617817
key: test_recall
value: [0.88636364 0.81818182 0.70454545 0.81818182 0.86046512 0.86046512
0.97674419 0.90909091 0.79545455 0.81818182]
mean value: 0.8447674418604652
key: train_recall
value: [0.91348601 0.91603053 0.90330789 0.92111959 0.93147208 0.91116751
0.90862944 0.93638677 0.92111959 0.92875318]
mean value: 0.9191472597873962
key: test_roc_auc
value: [0.81818182 0.79545455 0.72727273 0.79545455 0.82795983 0.82795983
0.897463 0.81501057 0.80470402 0.78118393]
mean value: 0.8090644820295982
key: train_roc_auc
value: [0.88804071 0.89821883 0.88931298 0.90330789 0.91230093 0.8856092
0.88179564 0.91108679 0.89456995 0.90853903]
mean value: 0.89727819325506
key: test_jcc
value: [0.70909091 0.66666667 0.56363636 0.66666667 0.71153846 0.71153846
0.82352941 0.71428571 0.67307692 0.65454545]
mean value: 0.6894575032810327
key: train_jcc
value: [0.80313199 0.81818182 0.80316742 0.82648402 0.84174312 0.79955457
0.79379157 0.84018265 0.81348315 0.83524027]
mean value: 0.817496057662837
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01658487 0.01413941 0.01192164 0.01237583 0.01122928 0.01246905
0.01264453 0.01317739 0.01210928 0.01178169]
mean value: 0.01284329891204834
key: score_time
value: [0.01262975 0.00963998 0.00926065 0.00925183 0.00913525 0.00937343
0.00943851 0.00968313 0.00934291 0.00939965]
mean value: 0.009715509414672852
key: test_mcc
value: [0.32756921 0.4328254 0.32357511 0.56950711 0.44952813 0.49497627
0.40221987 0.19637409 0.33634906 0.28752643]
mean value: 0.38204506945222433
key: train_mcc
value: [0.38790491 0.46853479 0.45357422 0.43567904 0.46977459 0.46977459
0.43495033 0.44177962 0.4034274 0.47193424]
mean value: 0.4437333727941941
key: test_accuracy
value: [0.64772727 0.71590909 0.65909091 0.78409091 0.72413793 0.74712644
0.70114943 0.59770115 0.66666667 0.64367816]
mean value: 0.6887277951933124
key: train_accuracy
value: [0.68320611 0.73409669 0.7264631 0.71755725 0.73443456 0.73443456
0.71664549 0.72045743 0.70012706 0.73570521]
mean value: 0.7203127475419588
key: test_fscore
value: [0.71028037 0.72527473 0.6875 0.79120879 0.70731707 0.75
0.69767442 0.63157895 0.65060241 0.64367816]
mean value: 0.6995114900017191
key: train_fscore
value: [0.72786885 0.73907615 0.73358116 0.72456576 0.74292743 0.74292743
0.7290401 0.72839506 0.67934783 0.74129353]
mean value: 0.7289023304804851
key: test_precision
value: [0.6031746 0.70212766 0.63461538 0.76595745 0.74358974 0.73333333
0.69767442 0.58823529 0.69230769 0.65116279]
mean value: 0.6812178366823708
key: train_precision
value: [0.63793103 0.7254902 0.71497585 0.70702179 0.72076372 0.72076372
0.6993007 0.70743405 0.72886297 0.72506083]
mean value: 0.7087604867110122
key: test_recall
value: [0.86363636 0.75 0.75 0.81818182 0.6744186 0.76744186
0.69767442 0.68181818 0.61363636 0.63636364]
mean value: 0.7253171247357294
key: train_recall
value: [0.84732824 0.75318066 0.75318066 0.74300254 0.76649746 0.76649746
0.76142132 0.75063613 0.63613232 0.75826972]
mean value: 0.7536146523553041
key: test_roc_auc
value: [0.64772727 0.71590909 0.65909091 0.78409091 0.72357294 0.74735729
0.70110994 0.59672304 0.6672833 0.64376321]
mean value: 0.6886627906976744
key: train_roc_auc
value: [0.68320611 0.73409669 0.7264631 0.71755725 0.73439377 0.73439377
0.71658852 0.72049573 0.70004585 0.73573384]
mean value: 0.7202974645122124
key: test_jcc
value: [0.55072464 0.56896552 0.52380952 0.65454545 0.54716981 0.6
0.53571429 0.46153846 0.48214286 0.47457627]
mean value: 0.5399186820180317
key: train_jcc
value: [0.57216495 0.58613861 0.57925636 0.56809339 0.59099804 0.59099804
0.57361377 0.57281553 0.51440329 0.58893281]
mean value: 0.573741479292912
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01288152 0.01601839 0.01601601 0.01600742 0.01615286 0.01606011
0.01612782 0.01603866 0.01612043 0.01610899]
mean value: 0.01575322151184082
key: score_time
value: [0.01213813 0.01230121 0.01232433 0.01243854 0.01229501 0.01237464
0.01247144 0.01235986 0.0123229 0.01231599]
mean value: 0.012334203720092774
key: test_mcc
value: [0.50051733 0.36706517 0.22941573 0.52613536 0.51718675 0.38062515
0.40330006 0.33351176 0.58699109 0.24125255]
mean value: 0.4086000957908537
key: train_mcc
value: [0.44906143 0.4633579 0.46353821 0.49784849 0.45973957 0.47436493
0.44375086 0.48223144 0.45349856 0.49303545]
mean value: 0.46804268263856164
key: test_accuracy
value: [0.75 0.68181818 0.61363636 0.76136364 0.75862069 0.68965517
0.70114943 0.66666667 0.79310345 0.62068966]
mean value: 0.7036703239289446
key: train_accuracy
value: [0.72391858 0.73155216 0.73155216 0.74681934 0.72935197 0.73697586
0.72172808 0.7407878 0.72554003 0.74587039]
mean value: 0.7334096368791849
key: test_fscore
value: [0.75555556 0.70212766 0.63829787 0.77419355 0.75294118 0.69662921
0.68292683 0.68131868 0.79069767 0.63736264]
mean value: 0.7112050848179496
key: train_fscore
value: [0.73374233 0.7359199 0.73723537 0.76224612 0.73865031 0.74285714
0.72727273 0.74689826 0.73849879 0.75429975]
mean value: 0.7417620699172001
key: test_precision
value: [0.73913043 0.66 0.6 0.73469388 0.76190476 0.67391304
0.71794872 0.65957447 0.80952381 0.61702128]
mean value: 0.697371038987003
key: train_precision
value: [0.70853081 0.72413793 0.72195122 0.71846847 0.71496437 0.72749392
0.71393643 0.72881356 0.70438799 0.72921615]
mean value: 0.7191900844944616
key: test_recall
value: [0.77272727 0.75 0.68181818 0.81818182 0.74418605 0.72093023
0.65116279 0.70454545 0.77272727 0.65909091]
mean value: 0.727536997885835
key: train_recall
value: [0.76081425 0.7480916 0.75318066 0.81170483 0.76395939 0.75888325
0.74111675 0.76590331 0.77608142 0.78117048]
mean value: 0.766090595574844
key: test_roc_auc
value: [0.75 0.68181818 0.61363636 0.76136364 0.75845666 0.69001057
0.7005814 0.66622622 0.79334038 0.62024313]
mean value: 0.7035676532769556
key: train_roc_auc
value: [0.72391858 0.73155216 0.73155216 0.74681934 0.72930794 0.73694799
0.72170341 0.74081967 0.72560416 0.74591519]
mean value: 0.7334140607845416
key: test_jcc
value: [0.60714286 0.54098361 0.46875 0.63157895 0.60377358 0.53448276
0.51851852 0.51666667 0.65384615 0.46774194]
mean value: 0.5543485029110216
key: train_jcc
value: [0.57945736 0.58217822 0.58382643 0.61583012 0.58560311 0.59090909
0.57142857 0.5960396 0.58541267 0.60552268]
mean value: 0.5896207857503801
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01600981 0.01106691 0.010391 0.01046634 0.01080656 0.01074886
0.01149368 0.01192141 0.01154351 0.01168633]
mean value: 0.01161344051361084
key: score_time
value: [0.0355916 0.01523995 0.01347685 0.01362085 0.01415038 0.01872969
0.0192039 0.01468134 0.01902127 0.01453424]
mean value: 0.017825007438659668
key: test_mcc
value: [0.38646346 0.50051733 0.18353259 0.52286233 0.49497627 0.36069346
0.37916452 0.31021744 0.42577098 0.33458714]
mean value: 0.38987855000800115
key: train_mcc
value: [0.58032736 0.60193114 0.55910844 0.57224844 0.59486336 0.60956553
0.62834476 0.57455631 0.59101716 0.5941861 ]
mean value: 0.5906148615612009
key: test_accuracy
value: [0.69318182 0.75 0.59090909 0.76136364 0.74712644 0.67816092
0.68965517 0.65517241 0.71264368 0.66666667]
mean value: 0.6944879832810867
key: train_accuracy
value: [0.78880407 0.80025445 0.77862595 0.78498728 0.79669632 0.8043202
0.81321474 0.78653113 0.79542567 0.79669632]
mean value: 0.7945556126754416
key: test_fscore
value: [0.69662921 0.75555556 0.56097561 0.76404494 0.75 0.69565217
0.68235294 0.66666667 0.72527473 0.68817204]
mean value: 0.6985323872656682
key: train_fscore
value: [0.79854369 0.80688807 0.78728606 0.79415347 0.80392157 0.80987654
0.82051282 0.79361179 0.79748428 0.80148883]
mean value: 0.801376712958553
key: test_precision
value: [0.68888889 0.73913043 0.60526316 0.75555556 0.73333333 0.65306122
0.69047619 0.65217391 0.70212766 0.65306122]
mean value: 0.6873071582528851
key: train_precision
value: [0.76334107 0.78095238 0.75764706 0.76168224 0.77725118 0.78846154
0.79058824 0.7672209 0.78855721 0.78208232]
mean value: 0.7757784149640108
key: test_recall
value: [0.70454545 0.77272727 0.52272727 0.77272727 0.76744186 0.74418605
0.6744186 0.68181818 0.75 0.72727273]
mean value: 0.7117864693446089
key: train_recall
value: [0.83715013 0.8346056 0.81933842 0.82951654 0.83248731 0.83248731
0.85279188 0.82188295 0.80661578 0.82188295]
mean value: 0.8288758863874143
key: test_roc_auc
value: [0.69318182 0.75 0.59090909 0.76136364 0.74735729 0.67891121
0.68948203 0.65486258 0.7122093 0.66596195]
mean value: 0.694423890063425
key: train_roc_auc
value: [0.78880407 0.80025445 0.77862595 0.78498728 0.79665078 0.80428437
0.81316439 0.78657599 0.79543987 0.79672828]
mean value: 0.7945515428630475
key: test_jcc
value: [0.53448276 0.60714286 0.38983051 0.61818182 0.6 0.53333333
0.51785714 0.5 0.56896552 0.52459016]
mean value: 0.5394384099786222
key: train_jcc
value: [0.66464646 0.67628866 0.64919355 0.65858586 0.67213115 0.68049793
0.69565217 0.65784114 0.66317992 0.66873706]
mean value: 0.6686753895067395
MCC on Blind test: 0.29
Accuracy on Blind test: 0.66
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.05384135 0.05580044 0.0459094 0.0482986 0.04752588 0.04734731
0.0462234 0.04590893 0.04742646 0.04682469]
mean value: 0.04851064682006836
key: score_time
value: [0.01803613 0.01803875 0.01764989 0.01866722 0.01710939 0.01780176
0.01760387 0.01767254 0.01786971 0.0177207 ]
mean value: 0.017816996574401854
key: test_mcc
value: [0.62330229 0.50471461 0.45643546 0.57551157 0.65520898 0.54610162
0.53589621 0.48553084 0.59547841 0.50171077]
mean value: 0.5479890752759377
key: train_mcc
value: [0.65775744 0.67329502 0.64908799 0.64041864 0.62129367 0.65514187
0.64946023 0.6555617 0.65473654 0.64687618]
mean value: 0.6503629283381792
key: test_accuracy
value: [0.79545455 0.75 0.72727273 0.78409091 0.82758621 0.77011494
0.75862069 0.73563218 0.79310345 0.74712644]
mean value: 0.7689002089864159
key: train_accuracy
value: [0.82315522 0.8307888 0.82061069 0.81552163 0.80559085 0.82337992
0.82083863 0.82465057 0.82210928 0.81575604]
mean value: 0.820240162177367
key: test_fscore
value: [0.82352941 0.76595745 0.73913043 0.8 0.82352941 0.7826087
0.78350515 0.76767677 0.8125 0.77083333]
mean value: 0.7869270656421982
key: train_fscore
value: [0.83818393 0.8451688 0.83353011 0.83001172 0.82188591 0.83666275
0.83392226 0.83571429 0.8364486 0.83352468]
mean value: 0.8345053058485761
key: test_precision
value: [0.72413793 0.72 0.70833333 0.74509804 0.83333333 0.73469388
0.7037037 0.69090909 0.75 0.71153846]
mean value: 0.7321747770619113
key: train_precision
value: [0.77253219 0.77896996 0.77753304 0.76956522 0.75913978 0.77899344
0.77802198 0.7852349 0.77321814 0.75941423]
mean value: 0.7732622869197299
key: test_recall
value: [0.95454545 0.81818182 0.77272727 0.86363636 0.81395349 0.8372093
0.88372093 0.86363636 0.88636364 0.84090909]
mean value: 0.8534883720930233
key: train_recall
value: [0.91603053 0.92366412 0.89821883 0.90076336 0.89593909 0.9035533
0.89847716 0.89312977 0.91094148 0.92366412]
mean value: 0.9064381756887666
key: test_roc_auc
value: [0.79545455 0.75 0.72727273 0.78409091 0.82743129 0.77087738
0.76004228 0.73414376 0.79201903 0.74603594]
mean value: 0.7687367864693446
key: train_roc_auc
value: [0.82315522 0.8307888 0.82061069 0.81552163 0.8054759 0.82327792
0.82073985 0.82473747 0.82222201 0.81589297]
mean value: 0.820242246935586
key: test_jcc
value: [0.7 0.62068966 0.5862069 0.66666667 0.7 0.64285714
0.6440678 0.62295082 0.68421053 0.62711864]
mean value: 0.6494768147913834
key: train_jcc
value: [0.72144289 0.73185484 0.7145749 0.70941884 0.69762846 0.71919192
0.71515152 0.71779141 0.7188755 0.71456693]
mean value: 0.716049719596829
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.45261788 2.10335112 2.15535569 1.10358024 2.93123484 2.78560662
1.2532444 2.88533401 2.78543663 2.89050388]
mean value: 2.234626531600952
key: score_time
value: [0.01254225 0.01251721 0.01264668 0.01255536 0.01265812 0.01263404
0.01261973 0.0126524 0.01485896 0.01480293]
mean value: 0.013048768043518066
key: test_mcc
value: [0.62689067 0.61506768 0.48342972 0.62155249 0.67866682 0.64384947
0.61102358 0.63444041 0.61269937 0.58615222]
mean value: 0.6113772430907065
key: train_mcc
value: [0.86784408 0.90330789 0.89428856 0.77424761 0.92069685 0.91319934
0.79345585 0.9496176 0.91830829 0.94977755]
mean value: 0.8884743609283701
key: test_accuracy
value: [0.80681818 0.80681818 0.73863636 0.80681818 0.83908046 0.81609195
0.79310345 0.81609195 0.8045977 0.79310345]
mean value: 0.802115987460815
key: train_accuracy
value: [0.93256997 0.95165394 0.94656489 0.88549618 0.95933926 0.95552732
0.89326557 0.97458704 0.95806861 0.97458704]
mean value: 0.9431659828446349
key: test_fscore
value: [0.82474227 0.81318681 0.71604938 0.82105263 0.83333333 0.82978723
0.81632653 0.82608696 0.8172043 0.79545455]
mean value: 0.8093223996562732
key: train_fscore
value: [0.93512852 0.95165394 0.94516971 0.89051095 0.95800525 0.95705521
0.9 0.97493734 0.95940959 0.975 ]
mean value: 0.9446870526213144
key: test_precision
value: [0.75471698 0.78723404 0.78378378 0.76470588 0.85365854 0.76470588
0.72727273 0.79166667 0.7755102 0.79545455]
mean value: 0.7798709252235871
key: train_precision
value: [0.9009434 0.95165394 0.97050938 0.85314685 0.99184783 0.9263658
0.84753363 0.96049383 0.92857143 0.95823096]
mean value: 0.9289297044832938
key: test_recall
value: [0.90909091 0.84090909 0.65909091 0.88636364 0.81395349 0.90697674
0.93023256 0.86363636 0.86363636 0.79545455]
mean value: 0.8469344608879492
key: train_recall
value: [0.97201018 0.95165394 0.92111959 0.93129771 0.92639594 0.98984772
0.95939086 0.98982188 0.99236641 0.99236641]
mean value: 0.9626270650082019
key: test_roc_auc
value: [0.80681818 0.80681818 0.73863636 0.80681818 0.83879493 0.81712474
0.79466173 0.81553911 0.80391121 0.79307611]
mean value: 0.8022198731501057
key: train_roc_auc
value: [0.93256997 0.95165394 0.94656489 0.88549618 0.95938118 0.95548365
0.89318144 0.97460637 0.95811214 0.9746096 ]
mean value: 0.9431659368905078
key: test_jcc
value: [0.70175439 0.68518519 0.55769231 0.69642857 0.71428571 0.70909091
0.68965517 0.7037037 0.69090909 0.66037736]
mean value: 0.6809082399164754
key: train_jcc
value: [0.87816092 0.90776699 0.8960396 0.80263158 0.91939547 0.91764706
0.81818182 0.95110024 0.92198582 0.95121951]
mean value: 0.8964129008036302
MCC on Blind test: 0.32
Accuracy on Blind test: 0.68
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.0604918 0.04539132 0.05021358 0.04280376 0.04391217 0.04870415
0.05093575 0.04413128 0.04482388 0.04480577]
mean value: 0.04762134552001953
key: score_time
value: [0.00954056 0.00907254 0.00901723 0.00905633 0.00914407 0.00906348
0.00919557 0.00921273 0.00918293 0.00919628]
mean value: 0.009168171882629394
key: test_mcc
value: [0.84287052 0.75174939 0.64236405 0.81902836 0.72746922 0.81606765
0.74735729 0.7472238 0.65539112 0.65905141]
mean value: 0.7408572818519341
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92045455 0.875 0.81818182 0.90909091 0.86206897 0.90804598
0.87356322 0.87356322 0.82758621 0.82758621]
mean value: 0.869514106583072
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92307692 0.87912088 0.82978723 0.91111111 0.86666667 0.90697674
0.87356322 0.87640449 0.82758621 0.83870968]
mean value: 0.8733003155292913
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89361702 0.85106383 0.78 0.89130435 0.82978723 0.90697674
0.86363636 0.86666667 0.8372093 0.79591837]
mean value: 0.8516179877094067
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.90909091 0.88636364 0.93181818 0.90697674 0.90697674
0.88372093 0.88636364 0.81818182 0.88636364]
mean value: 0.8970401691331924
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92045455 0.875 0.81818182 0.90909091 0.86257928 0.90803383
0.87367865 0.87341438 0.82769556 0.82690275]
mean value: 0.8695031712473573
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85714286 0.78431373 0.70909091 0.83673469 0.76470588 0.82978723
0.7755102 0.78 0.70588235 0.72222222]
mean value: 0.7765390081242038
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.16874266 0.17347503 0.1705687 0.17209721 0.17211318 0.17244864
0.17359471 0.17075062 0.17084527 0.17014956]
mean value: 0.1714785575866699
key: score_time
value: [0.01893568 0.02032566 0.01877761 0.01896501 0.02001143 0.01879668
0.01911426 0.01880836 0.01880097 0.0191679 ]
mean value: 0.019170355796813966
key: test_mcc
value: [0.77352678 0.54601891 0.65926119 0.63702206 0.67900591 0.68515773
0.54295079 0.52312769 0.70301836 0.63213531]
mean value: 0.6381224729589329
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.88636364 0.77272727 0.82954545 0.81818182 0.83908046 0.83908046
0.77011494 0.75862069 0.85057471 0.81609195]
mean value: 0.8180381400208986
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.88888889 0.77777778 0.82758621 0.82222222 0.84090909 0.84782609
0.77777778 0.77894737 0.84705882 0.81818182]
mean value: 0.8227176061561114
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.86956522 0.76086957 0.8372093 0.80434783 0.82222222 0.79591837
0.74468085 0.7254902 0.87804878 0.81818182]
mean value: 0.8056534146402279
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.79545455 0.81818182 0.84090909 0.86046512 0.90697674
0.81395349 0.84090909 0.81818182 0.81818182]
mean value: 0.84223044397463
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.88636364 0.77272727 0.82954545 0.81818182 0.83932347 0.83985201
0.77061311 0.75766385 0.85095137 0.81606765]
mean value: 0.8181289640591967
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8 0.63636364 0.70588235 0.69811321 0.7254902 0.73584906
0.63636364 0.63793103 0.73469388 0.69230769]
mean value: 0.7002994690239295
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01320601 0.01333499 0.01326704 0.01336837 0.0133636 0.01205587
0.01195025 0.01203203 0.01198006 0.01238704]
mean value: 0.012694525718688964
key: score_time
value: [0.0099504 0.00991678 0.00989676 0.00985718 0.0095439 0.00915027
0.00912404 0.00921392 0.00910091 0.00892544]
mean value: 0.0094679594039917
key: test_mcc
value: [0.50471461 0.43738879 0.29553088 0.61379491 0.3853797 0.5504913
0.33456898 0.42976952 0.40221987 0.4957562 ]
mean value: 0.44496147605893255
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.75 0.71590909 0.64772727 0.80681818 0.68965517 0.77011494
0.66666667 0.71264368 0.70114943 0.74712644]
mean value: 0.7207810867293626
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.76595745 0.73684211 0.65168539 0.8045977 0.70967742 0.78723404
0.6741573 0.73684211 0.70454545 0.76086957]
mean value: 0.7332408536784342
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72 0.68627451 0.64444444 0.81395349 0.66 0.7254902
0.65217391 0.68627451 0.70454545 0.72916667]
mean value: 0.7022323182758412
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.81818182 0.79545455 0.65909091 0.79545455 0.76744186 0.86046512
0.69767442 0.79545455 0.70454545 0.79545455]
mean value: 0.7689217758985201
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75 0.71590909 0.64772727 0.80681818 0.69053911 0.77114165
0.66701903 0.71168076 0.70110994 0.74656448]
mean value: 0.7208509513742072
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.62068966 0.58333333 0.48333333 0.67307692 0.55 0.64912281
0.50847458 0.58333333 0.54385965 0.61403509]
mean value: 0.5809258698380173
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.75696111 2.67333555 2.62944531 2.70618653 2.58711076 2.66426349
2.66188812 2.57534146 2.5954206 2.59278345]
mean value: 2.6442736387252808
key: score_time
value: [0.10721159 0.10074186 0.10092258 0.10331321 0.09919405 0.10507584
0.09974909 0.10483932 0.09813857 0.0980525 ]
mean value: 0.10172386169433593
key: test_mcc
value: [0.86452993 0.82589664 0.75019377 0.75174939 0.77786181 0.84118687
0.85040097 0.79810753 0.79334038 0.86289151]
mean value: 0.8116158816503178
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93181818 0.90909091 0.875 0.875 0.88505747 0.91954023
0.91954023 0.89655172 0.89655172 0.93103448]
mean value: 0.9039184952978057
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93333333 0.91489362 0.87640449 0.87912088 0.89130435 0.92134831
0.92473118 0.90322581 0.89655172 0.93333333]
mean value: 0.9074247033008916
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91304348 0.86 0.86666667 0.85106383 0.83673469 0.89130435
0.86 0.85714286 0.90697674 0.91304348]
mean value: 0.8755976096008181
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.97727273 0.88636364 0.90909091 0.95348837 0.95348837
1. 0.95454545 0.88636364 0.95454545]
mean value: 0.942970401691332
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93181818 0.90909091 0.875 0.875 0.8858351 0.919926
0.92045455 0.89587738 0.89667019 0.9307611 ]
mean value: 0.9040433403805497
key: train_roc_auc
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.875 0.84313725 0.78 0.78431373 0.80392157 0.85416667
0.86 0.82352941 0.8125 0.875 ]
mean value: 0.831156862745098
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.64
Accuracy on Blind test: 0.82
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.1576159 1.13111639 1.09706926 1.06529927 1.07890487 1.10068297
1.08443117 1.10074687 1.09086037 1.1387794 ]
mean value: 1.1045506477355957
key: score_time
value: [0.27794409 0.15603256 0.28928804 0.25447321 0.27145886 0.28354359
0.27408671 0.28654885 0.28274965 0.26291871]
mean value: 0.26390442848205564
key: test_mcc
value: [0.88843109 0.8057162 0.72802521 0.81902836 0.70301836 0.77008457
0.84485784 0.74867823 0.81702814 0.86289151]
mean value: 0.798775949639167
key: train_mcc
value: [0.90651296 0.91391215 0.90099965 0.90125658 0.91171289 0.901238
0.90632277 0.92681126 0.90870336 0.9217261 ]
mean value: 0.9099195719638971
key: test_accuracy
value: [0.94318182 0.89772727 0.86363636 0.90909091 0.85057471 0.88505747
0.91954023 0.87356322 0.90804598 0.93103448]
mean value: 0.8981452455590386
key: train_accuracy
value: [0.95292621 0.956743 0.95038168 0.95038168 0.95552732 0.95044473
0.95298602 0.96315121 0.95425667 0.96060991]
mean value: 0.9547408427661975
key: test_fscore
value: [0.94505495 0.90526316 0.86666667 0.91111111 0.85393258 0.88372093
0.92307692 0.87912088 0.90697674 0.93333333]
mean value: 0.9008257274946863
key: train_fscore
value: [0.95380774 0.95739348 0.9509434 0.95118899 0.95641345 0.95118899
0.95369212 0.96370463 0.95465995 0.9612015 ]
mean value: 0.9554194239721927
key: test_precision
value: [0.91489362 0.84313725 0.84782609 0.89130435 0.82608696 0.88372093
0.875 0.85106383 0.92857143 0.91304348]
mean value: 0.8774647930079675
key: train_precision
value: [0.93627451 0.94320988 0.94029851 0.93596059 0.93887531 0.9382716
0.94074074 0.94827586 0.94513716 0.94581281]
mean value: 0.9412856963303278
key: test_recall
value: [0.97727273 0.97727273 0.88636364 0.93181818 0.88372093 0.88372093
0.97674419 0.90909091 0.88636364 0.95454545]
mean value: 0.9266913319238901
key: train_recall
value: [0.97201018 0.97201018 0.96183206 0.96692112 0.97461929 0.96446701
0.96700508 0.97964377 0.96437659 0.97709924]
mean value: 0.9699984500329368
key: test_roc_auc
value: [0.94318182 0.89772727 0.86363636 0.90909091 0.85095137 0.88504228
0.92019027 0.87315011 0.9082981 0.9307611 ]
mean value: 0.8982029598308667
key: train_roc_auc
value: [0.95292621 0.956743 0.95038168 0.95038168 0.95550303 0.95042689
0.95296819 0.96317214 0.95426951 0.96063084]
mean value: 0.954740315934953
key: test_jcc
value: [0.89583333 0.82692308 0.76470588 0.83673469 0.74509804 0.79166667
0.85714286 0.78431373 0.82978723 0.875 ]
mean value: 0.8207205509044861
key: train_jcc
value: [0.91169451 0.91826923 0.90647482 0.90692124 0.91646778 0.90692124
0.91148325 0.92995169 0.91325301 0.9253012 ]
mean value: 0.9146737985460048
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0279119 0.01599693 0.01595879 0.0159502 0.01609945 0.0159781
0.02555108 0.01599431 0.01797676 0.01640034]
mean value: 0.018381786346435548
key: score_time
value: [0.01233649 0.01231313 0.01254392 0.01234722 0.0123775 0.01232219
0.01229525 0.01237607 0.01264095 0.01244164]
mean value: 0.012399435043334961
key: test_mcc
value: [0.50051733 0.36706517 0.22941573 0.52613536 0.51718675 0.38062515
0.40330006 0.33351176 0.58699109 0.24125255]
mean value: 0.4086000957908537
key: train_mcc
value: [0.44906143 0.4633579 0.46353821 0.49784849 0.45973957 0.47436493
0.44375086 0.48223144 0.45349856 0.49303545]
mean value: 0.46804268263856164
key: test_accuracy
value: [0.75 0.68181818 0.61363636 0.76136364 0.75862069 0.68965517
0.70114943 0.66666667 0.79310345 0.62068966]
mean value: 0.7036703239289446
key: train_accuracy
value: [0.72391858 0.73155216 0.73155216 0.74681934 0.72935197 0.73697586
0.72172808 0.7407878 0.72554003 0.74587039]
mean value: 0.7334096368791849
key: test_fscore
value: [0.75555556 0.70212766 0.63829787 0.77419355 0.75294118 0.69662921
0.68292683 0.68131868 0.79069767 0.63736264]
mean value: 0.7112050848179496
key: train_fscore
value: [0.73374233 0.7359199 0.73723537 0.76224612 0.73865031 0.74285714
0.72727273 0.74689826 0.73849879 0.75429975]
mean value: 0.7417620699172001
key: test_precision
value: [0.73913043 0.66 0.6 0.73469388 0.76190476 0.67391304
0.71794872 0.65957447 0.80952381 0.61702128]
mean value: 0.697371038987003
key: train_precision
value: [0.70853081 0.72413793 0.72195122 0.71846847 0.71496437 0.72749392
0.71393643 0.72881356 0.70438799 0.72921615]
mean value: 0.7191900844944616
key: test_recall
value: [0.77272727 0.75 0.68181818 0.81818182 0.74418605 0.72093023
0.65116279 0.70454545 0.77272727 0.65909091]
mean value: 0.727536997885835
key: train_recall
value: [0.76081425 0.7480916 0.75318066 0.81170483 0.76395939 0.75888325
0.74111675 0.76590331 0.77608142 0.78117048]
mean value: 0.766090595574844
key: test_roc_auc
value: [0.75 0.68181818 0.61363636 0.76136364 0.75845666 0.69001057
0.7005814 0.66622622 0.79334038 0.62024313]
mean value: 0.7035676532769556
key: train_roc_auc
value: [0.72391858 0.73155216 0.73155216 0.74681934 0.72930794 0.73694799
0.72170341 0.74081967 0.72560416 0.74591519]
mean value: 0.7334140607845416
key: test_jcc
value: [0.60714286 0.54098361 0.46875 0.63157895 0.60377358 0.53448276
0.51851852 0.51666667 0.65384615 0.46774194]
mean value: 0.5543485029110216
key: train_jcc
value: [0.57945736 0.58217822 0.58382643 0.61583012 0.58560311 0.59090909
0.57142857 0.5960396 0.58541267 0.60552268]
mean value: 0.5896207857503801
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.16803455 0.12479401 0.13190556 0.13270855 0.13482141 0.12760592
0.12809372 0.13602591 0.12585688 0.12302065]
mean value: 0.133286714553833
key: score_time
value: [0.01130319 0.01135063 0.01241302 0.01135039 0.01175189 0.01135063
0.01132727 0.01134205 0.01131344 0.01137137]
mean value: 0.011487388610839843
key: test_mcc
value: [0.88659264 0.77594029 0.79566006 0.84287052 0.77008457 0.86205074
0.86585804 0.79323121 0.84118687 0.81683533]
mean value: 0.8250310278177166
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94318182 0.88636364 0.89772727 0.92045455 0.88505747 0.93103448
0.93103448 0.89655172 0.91954023 0.90804598]
mean value: 0.9118991640543365
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94252874 0.89130435 0.89655172 0.92307692 0.88372093 0.93023256
0.93333333 0.8988764 0.91764706 0.91111111]
mean value: 0.9128383126807573
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95348837 0.85416667 0.90697674 0.89361702 0.88372093 0.93023256
0.89361702 0.88888889 0.95121951 0.89130435]
mean value: 0.9047232062781119
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.93181818 0.93181818 0.88636364 0.95454545 0.88372093 0.93023256
0.97674419 0.90909091 0.88636364 0.93181818]
mean value: 0.9222515856236786
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94318182 0.88636364 0.89772727 0.92045455 0.88504228 0.93102537
0.93155391 0.89640592 0.919926 0.90776956]
mean value: 0.9119450317124735
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.89130435 0.80392157 0.8125 0.85714286 0.79166667 0.86956522
0.875 0.81632653 0.84782609 0.83673469]
mean value: 0.8401987969100684
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.04277778 0.06277061 0.09687138 0.06872678 0.07880187 0.06946063
0.08058167 0.06243658 0.07856321 0.05238271]
mean value: 0.06933732032775879
key: score_time
value: [0.01239014 0.01243758 0.02226973 0.01272941 0.01268435 0.01261806
0.01257753 0.01916575 0.01260209 0.01247287]
mean value: 0.014194750785827636
key: test_mcc
value: [0.64715023 0.59648091 0.38726484 0.52613536 0.65994555 0.61371748
0.70637613 0.66885041 0.51718675 0.54016913]
mean value: 0.586327679755505
key: train_mcc
value: [0.76304068 0.7600656 0.77117136 0.76814463 0.75793471 0.75767069
0.74972171 0.77835845 0.76606319 0.76083987]
mean value: 0.7633010876409456
key: test_accuracy
value: [0.81818182 0.79545455 0.69318182 0.76136364 0.82758621 0.8045977
0.83908046 0.82758621 0.75862069 0.77011494]
mean value: 0.789576802507837
key: train_accuracy
value: [0.88040712 0.87913486 0.88422392 0.88295165 0.87801779 0.87801779
0.8729352 0.88818297 0.88182973 0.87801779]
mean value: 0.8803718827899939
key: test_fscore
value: [0.83333333 0.80851064 0.7032967 0.77419355 0.83516484 0.81318681
0.85714286 0.84536082 0.76404494 0.77272727]
mean value: 0.8006961770099277
key: train_fscore
value: [0.88480392 0.88314883 0.88888889 0.8872549 0.88235294 0.88206388
0.87922705 0.89189189 0.88616891 0.88433735]
mean value: 0.8850138572225262
key: test_precision
value: [0.76923077 0.76 0.68085106 0.73469388 0.79166667 0.77083333
0.76363636 0.77358491 0.75555556 0.77272727]
mean value: 0.7572779808191146
key: train_precision
value: [0.8534279 0.8547619 0.85446009 0.85579196 0.85308057 0.8547619
0.83870968 0.86223278 0.85377358 0.83981693]
mean value: 0.8520817305357777
key: test_recall
value: [0.90909091 0.86363636 0.72727273 0.81818182 0.88372093 0.86046512
0.97674419 0.93181818 0.77272727 0.77272727]
mean value: 0.8516384778012684
key: train_recall
value: [0.91857506 0.91348601 0.92620865 0.92111959 0.91370558 0.91116751
0.92385787 0.92366412 0.92111959 0.93384224]
mean value: 0.9206746231642577
key: test_roc_auc
value: [0.81818182 0.79545455 0.69318182 0.76136364 0.8282241 0.80523256
0.84064482 0.82637421 0.75845666 0.77008457]
mean value: 0.7897198731501057
key: train_roc_auc
value: [0.88040712 0.87913486 0.88422392 0.88295165 0.87797238 0.87797561
0.87287041 0.888228 0.88187959 0.87808863]
mean value: 0.8803732191524264
key: test_jcc
value: [0.71428571 0.67857143 0.54237288 0.63157895 0.71698113 0.68518519
0.75 0.73214286 0.61818182 0.62962963]
mean value: 0.6698929593796458
key: train_jcc
value: [0.79340659 0.7907489 0.8 0.79735683 0.78947368 0.78901099
0.78448276 0.80487805 0.7956044 0.79265659]
mean value: 0.793761878397893
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01543641 0.01584482 0.01554322 0.01582074 0.01554036 0.01604772
0.01601362 0.01568246 0.01603746 0.0155468 ]
mean value: 0.015751361846923828
key: score_time
value: [0.01288176 0.01244569 0.01278973 0.01243854 0.01280117 0.01244664
0.01247144 0.0124588 0.0125134 0.01243162]
mean value: 0.012567877769470215
key: test_mcc
value: [0.51970115 0.4328254 0.38726484 0.60092521 0.5404983 0.49418605
0.35843235 0.28973226 0.51803019 0.33641135]
mean value: 0.4478007105008516
key: train_mcc
value: [0.4672002 0.45784843 0.47741223 0.48485612 0.45579637 0.4606799
0.47622996 0.47323703 0.46038218 0.47323703]
mean value: 0.46868794567069183
key: test_accuracy
value: [0.75 0.71590909 0.69318182 0.79545455 0.77011494 0.74712644
0.67816092 0.64367816 0.75862069 0.66666667]
mean value: 0.7218913270637408
key: train_accuracy
value: [0.73282443 0.72773537 0.73791349 0.74045802 0.72681067 0.72935197
0.73697586 0.73570521 0.72935197 0.73570521]
mean value: 0.7332832187163545
key: test_fscore
value: [0.78 0.72527473 0.7032967 0.8125 0.76190476 0.74418605
0.68888889 0.67368421 0.76923077 0.69473684]
mean value: 0.7353702947739056
key: train_fscore
value: [0.74327628 0.7409201 0.74816626 0.75598086 0.74002418 0.74181818
0.7496977 0.74634146 0.73992674 0.74634146]
mean value: 0.7452493235793951
key: test_precision
value: [0.69642857 0.70212766 0.68085106 0.75 0.7804878 0.74418605
0.65957447 0.62745098 0.74468085 0.64705882]
mean value: 0.7032846269293008
key: train_precision
value: [0.71529412 0.70669746 0.72 0.71331828 0.70669746 0.7099768
0.71593533 0.71662763 0.71126761 0.71662763]
mean value: 0.7132442329211506
key: test_recall
value: [0.88636364 0.75 0.72727273 0.88636364 0.74418605 0.74418605
0.72093023 0.72727273 0.79545455 0.75 ]
mean value: 0.7732029598308668
key: train_recall
value: [0.7735369 0.77862595 0.77862595 0.80407125 0.77664975 0.77664975
0.78680203 0.77862595 0.77099237 0.77862595]
mean value: 0.7803205848542385
key: test_roc_auc
value: [0.75 0.71590909 0.69318182 0.79545455 0.7698203 0.74709302
0.67864693 0.64270613 0.75819239 0.66569767]
mean value: 0.7216701902748415
key: train_roc_auc
value: [0.73282443 0.72773537 0.73791349 0.74045802 0.72674726 0.72929179
0.73691247 0.73575968 0.72940481 0.73575968]
mean value: 0.7332806990351455
key: test_jcc
value: [0.63934426 0.56896552 0.54237288 0.68421053 0.61538462 0.59259259
0.52542373 0.50793651 0.625 0.53225806]
mean value: 0.5833488696451588
key: train_jcc
value: [0.59143969 0.58846154 0.59765625 0.60769231 0.58733205 0.58959538
0.59961315 0.59533074 0.5872093 0.59533074]
mean value: 0.593966114806459
MCC on Blind test: 0.29
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03588009 0.02543688 0.0264957 0.03651309 0.03167486 0.02774239
0.02617311 0.03044653 0.02691627 0.02822828]
mean value: 0.029550719261169433
key: score_time
value: [0.01276159 0.01245403 0.01267171 0.01251268 0.01240969 0.01239777
0.01264715 0.01272607 0.012429 0.01247954]
mean value: 0.01254892349243164
key: test_mcc
value: [0.54772256 0.3796283 0.26490647 0.61419227 0.65696218 0.49974958
0.56342495 0.35625628 0.46314724 0.49682118]
mean value: 0.48428110037982225
key: train_mcc
value: [0.68816837 0.47401498 0.41612519 0.65815286 0.72539042 0.678299
0.62574484 0.35995489 0.53423919 0.66801919]
mean value: 0.582810893164612
key: test_accuracy
value: [0.77272727 0.65909091 0.59090909 0.79545455 0.82758621 0.74712644
0.7816092 0.6091954 0.70114943 0.74712644]
mean value: 0.7231974921630094
key: train_accuracy
value: [0.83842239 0.69720102 0.65267176 0.81170483 0.86022872 0.8360864
0.81194409 0.61880559 0.73189327 0.83354511]
mean value: 0.7692503176620076
key: test_fscore
value: [0.76190476 0.53125 0.35714286 0.82 0.83146067 0.76086957
0.7816092 0.37037037 0.76363636 0.73809524]
mean value: 0.6716339025926584
key: train_fscore
value: [0.82237762 0.58098592 0.47398844 0.8377193 0.86810552 0.84661118
0.80474934 0.3877551 0.78491335 0.82875817]
mean value: 0.7235963934245662
key: test_precision
value: [0.8 0.85 0.83333333 0.73214286 0.80434783 0.71428571
0.77272727 1. 0.63636364 0.775 ]
mean value: 0.7918200639939771
key: train_precision
value: [0.91304348 0.94285714 0.97619048 0.73603083 0.82272727 0.79642058
0.83791209 0.97938144 0.6547619 0.85215054]
mean value: 0.851147575381499
key: test_recall
value: [0.72727273 0.38636364 0.22727273 0.93181818 0.86046512 0.81395349
0.79069767 0.22727273 0.95454545 0.70454545]
mean value: 0.6624207188160677
key: train_recall
value: [0.7480916 0.41984733 0.3129771 0.97201018 0.91878173 0.9035533
0.77411168 0.24173028 0.97964377 0.80661578]
mean value: 0.7077362731041965
key: test_roc_auc
value: [0.77272727 0.65909091 0.59090909 0.79545455 0.82795983 0.74788584
0.78171247 0.61363636 0.69820296 0.74762156]
mean value: 0.7235200845665962
key: train_roc_auc
value: [0.83842239 0.69720102 0.65267176 0.81170483 0.86015422 0.83600057
0.81199222 0.61832707 0.73220767 0.83351093]
mean value: 0.769219268673874
key: test_jcc
value: [0.61538462 0.36170213 0.2173913 0.69491525 0.71153846 0.61403509
0.64150943 0.22727273 0.61764706 0.58490566]
mean value: 0.5286301731322943
key: train_jcc
value: [0.69833729 0.40942928 0.31060606 0.72075472 0.76694915 0.73402062
0.67328918 0.24050633 0.64597315 0.70758929]
mean value: 0.5907455073658393
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03075266 0.03120208 0.03316736 0.03134346 0.03602695 0.03724575
0.04394269 0.03862977 0.03296971 0.04053521]
mean value: 0.035581564903259276
key: score_time
value: [0.01244569 0.0128181 0.01300859 0.012604 0.0123992 0.01992011
0.01752043 0.01252294 0.01711726 0.01253414]
mean value: 0.014289045333862304
key: test_mcc
value: [0.65273779 0.35805744 0.43386092 0.3380617 0.42085785 0.58615222
0.73720764 0.66651249 0.62044826 0.50908452]
mean value: 0.5322980837799182
key: train_mcc
value: [0.68348613 0.29359034 0.59278749 0.4467915 0.44671605 0.74603073
0.6997302 0.69671464 0.56654792 0.69518732]
mean value: 0.5867582318483535
key: test_accuracy
value: [0.80681818 0.61363636 0.68181818 0.63636364 0.66666667 0.79310345
0.86206897 0.81609195 0.79310345 0.74712644]
mean value: 0.7416797283176594
key: train_accuracy
value: [0.82569975 0.58142494 0.76463104 0.67557252 0.67090216 0.8729352
0.84879288 0.83227446 0.75984752 0.8386277 ]
mean value: 0.7670708168035927
key: test_fscore
value: [0.83495146 0.37037037 0.75 0.48387097 0.50847458 0.79069767
0.87234043 0.84313725 0.75675676 0.71794872]
mean value: 0.6928548200252127
key: train_fscore
value: [0.84861878 0.2832244 0.807892 0.53038674 0.51588785 0.87179487
0.85470085 0.8539823 0.69952305 0.81779053]
mean value: 0.708380139104571
key: test_precision
value: [0.72881356 1. 0.61764706 0.83333333 0.9375 0.79069767
0.80392157 0.74137931 0.93333333 0.82352941]
mean value: 0.8210155249967819
key: train_precision
value: [0.75 0.98484848 0.68245614 0.96 0.9787234 0.88082902
0.82352941 0.7553816 0.93220339 0.9375 ]
mean value: 0.868547145129061
key: test_recall
value: [0.97727273 0.22727273 0.95454545 0.34090909 0.34883721 0.79069767
0.95348837 0.97727273 0.63636364 0.63636364]
mean value: 0.6843023255813954
key: train_recall
value: [0.97709924 0.1653944 0.98982188 0.36641221 0.35025381 0.86294416
0.88832487 0.9821883 0.55979644 0.72519084]
mean value: 0.6867426150527635
key: test_roc_auc
value: [0.80681818 0.61363636 0.68181818 0.63636364 0.66305497 0.79307611
0.86310782 0.81421776 0.794926 0.74841438]
mean value: 0.7415433403805497
key: train_roc_auc
value: [0.82569975 0.58142494 0.76463104 0.67557252 0.67131011 0.87294791
0.84874259 0.83246471 0.75959365 0.83848374]
mean value: 0.7670870952325596
key: test_jcc
value: [0.71666667 0.22727273 0.6 0.31914894 0.34090909 0.65384615
0.77358491 0.72881356 0.60869565 0.56 ]
mean value: 0.5528937692021176
key: train_jcc
value: [0.73704415 0.16497462 0.67770035 0.36090226 0.34760705 0.77272727
0.74626866 0.74517375 0.53789731 0.69174757]
mean value: 0.5782042980076957
MCC on Blind test: 0.56
Accuracy on Blind test: 0.74
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.25977039 0.24408913 0.24532914 0.24836373 0.24611473 0.24944425
0.25362062 0.25479889 0.24295735 0.24418402]
mean value: 0.24886722564697267
key: score_time
value: [0.0159719 0.01603484 0.01604414 0.01655626 0.01659632 0.01704359
0.01575017 0.01572776 0.0158062 0.01587892]
mean value: 0.016141009330749512
key: test_mcc
value: [0.79566006 0.70618882 0.79730996 0.73029674 0.68066848 0.83932347
0.7951307 0.79480784 0.79334038 0.81606765]
mean value: 0.7748794103853376
key: train_mcc
value: [0.8909343 0.89850975 0.88584325 0.89602536 0.89120603 0.89603807
0.90102964 0.88823013 0.88836928 0.88587654]
mean value: 0.8922062350735477
key: test_accuracy
value: [0.89772727 0.85227273 0.89772727 0.86363636 0.83908046 0.91954023
0.89655172 0.89655172 0.89655172 0.90804598]
mean value: 0.8867685475444096
key: train_accuracy
value: [0.94529262 0.94910941 0.94274809 0.94783715 0.94536213 0.94790343
0.95044473 0.94409149 0.94409149 0.94282084]
mean value: 0.9459701381546828
key: test_fscore
value: [0.89655172 0.85714286 0.9010989 0.86956522 0.82926829 0.91954023
0.8988764 0.9010989 0.89655172 0.90909091]
mean value: 0.8878785161161101
key: train_fscore
value: [0.94604768 0.94974874 0.94353827 0.94855709 0.9463171 0.94855709
0.9509434 0.9443038 0.94458438 0.94339623]
mean value: 0.9465993775790983
key: test_precision
value: [0.90697674 0.82978723 0.87234043 0.83333333 0.87179487 0.90909091
0.86956522 0.87234043 0.90697674 0.90909091]
mean value: 0.8781296814179803
key: train_precision
value: [0.93316832 0.93796526 0.93069307 0.93564356 0.93120393 0.93796526
0.94264339 0.9395466 0.93516209 0.93283582]
mean value: 0.9356827309466825
key: test_recall
value: [0.88636364 0.88636364 0.93181818 0.90909091 0.79069767 0.93023256
0.93023256 0.93181818 0.88636364 0.90909091]
mean value: 0.8992071881606765
key: train_recall
value: [0.95928753 0.96183206 0.956743 0.96183206 0.96192893 0.95939086
0.95939086 0.94910941 0.95419847 0.95419847]
mean value: 0.9577911677710182
key: test_roc_auc
value: [0.89772727 0.85227273 0.89772727 0.86363636 0.83853066 0.91966173
0.89693446 0.89614165 0.89667019 0.90803383]
mean value: 0.8867336152219872
key: train_roc_auc
value: [0.94529262 0.94910941 0.94274809 0.94783715 0.94534106 0.94788882
0.95043334 0.94409785 0.94410431 0.94283528]
mean value: 0.9459687939964609
key: test_jcc
value: [0.8125 0.75 0.82 0.76923077 0.70833333 0.85106383
0.81632653 0.82 0.8125 0.83333333]
mean value: 0.7993287796296915
key: train_jcc
value: [0.89761905 0.90430622 0.89311164 0.90214797 0.89810427 0.90214797
0.90647482 0.89448441 0.89498807 0.89285714]
mean value: 0.8986241557090046
MCC on Blind test: 0.57
Accuracy on Blind test: 0.79
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.22399497 0.2275095 0.24593782 0.22993827 0.23426127 0.1324122
0.22641897 0.228791 0.22016549 0.2273438 ]
mean value: 0.21967732906341553
key: score_time
value: [0.03998446 0.04093313 0.03987241 0.04082918 0.04121041 0.0396142
0.02495956 0.04249907 0.04032087 0.03757215]
mean value: 0.03877954483032227
key: test_mcc
value: [0.86452993 0.82589664 0.81818182 0.79566006 0.79334038 0.74735729
0.83923862 0.77312462 0.81702814 0.81606765]
mean value: 0.809042516543566
key: train_mcc
value: [0.99239533 0.99492383 0.98982188 0.98730612 0.98476502 0.97738462
0.98729673 0.98732207 0.98480289 0.98480289]
mean value: 0.9870821365169552
key: test_accuracy
value: [0.93181818 0.90909091 0.90909091 0.89772727 0.89655172 0.87356322
0.91954023 0.88505747 0.90804598 0.90804598]
mean value: 0.9038531870428422
key: train_accuracy
value: [0.99618321 0.99745547 0.99491094 0.99363868 0.99237611 0.98856417
0.99364676 0.99364676 0.99237611 0.99237611]
mean value: 0.9935174318037059
key: test_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[0.93333333 0.91489362 0.90909091 0.8988764 0.89655172 0.87356322
0.91764706 0.89130435 0.90697674 0.90909091]
mean value: 0.9051328266395209
key: train_fscore
value: [0.99616858 0.99746193 0.99491094 0.9936143 0.99236641 0.98844673
0.99364676 0.9936143 0.99232737 0.99232737]
mean value: 0.9934884690795172
key: test_precision
value: [0.91304348 0.86 0.90909091 0.88888889 0.88636364 0.86363636
0.92857143 0.85416667 0.92857143 0.90909091]
mean value: 0.8941423709141101
key: train_precision
value: [1. 0.99493671 0.99491094 0.9974359 0.99489796 1.
0.99491094 0.9974359 0.99742931 0.99742931]
mean value: 0.9969386957693075
key: test_recall
value: [0.95454545 0.97727273 0.90909091 0.90909091 0.90697674 0.88372093
0.90697674 0.93181818 0.88636364 0.90909091]
mean value: 0.9174947145877378
key: train_recall
value: [0.99236641 1. 0.99491094 0.98982188 0.98984772 0.97715736
0.99238579 0.98982188 0.98727735 0.98727735]
mean value: 0.9900866689916172
key: test_roc_auc
value: [0.93181818 0.90909091 0.90909091 0.89772727 0.89667019 0.87367865
0.91939746 0.88451374 0.9082981 0.90803383]
mean value: 0.9038319238900634
key: train_roc_auc
value: [0.99618321 0.99745547 0.99491094 0.99363868 0.99237933 0.98857868
0.99364836 0.99364191 0.99236964 0.99236964]
mean value: 0.9935175856679712
key: test_jcc
value: [0.875 0.84313725 0.83333333 0.81632653 0.8125 0.7755102
0.84782609 0.80392157 0.82978723 0.83333333]
mean value: 0.8270675545889031
key: train_jcc
value: [0.99236641 0.99493671 0.98987342 0.98730964 0.98484848 0.97715736
0.98737374 0.98730964 0.98477157 0.98477157]
mean value: 0.9870718557972555
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.34912157 0.47804213 0.33337474 0.45919132 0.30793595 0.29898429
0.39014459 0.32371235 0.44865108 0.29094195]
mean value: 0.3680099964141846
key: score_time
value: [0.03037262 0.0336256 0.01945448 0.01939178 0.01974702 0.01969528
0.0238955 0.019871 0.03380036 0.03445649]
mean value: 0.025431013107299803
key: test_mcc
value: [0.66759342 0.50471461 0.54601891 0.57551157 0.5504913 0.50908452
0.47273749 0.45482695 0.56980678 0.54466285]
mean value: 0.5395448399020225
key: train_mcc
value: [0.92972888 0.93021184 0.92995819 0.92243746 0.91770005 0.9198622
0.92011198 0.92012314 0.92818308 0.91987241]
mean value: 0.9238189213913246
key: test_accuracy
value: [0.82954545 0.75 0.77272727 0.78409091 0.77011494 0.74712644
0.73563218 0.72413793 0.7816092 0.77011494]
mean value: 0.7665099268547544
key: train_accuracy
value: [0.96437659 0.96437659 0.96437659 0.9605598 0.95806861 0.95933926
0.95933926 0.95933926 0.96315121 0.95933926]
mean value: 0.961226644163587
key: test_fscore
value: [0.84210526 0.76595745 0.76744186 0.8 0.78723404 0.77083333
0.74157303 0.75 0.8 0.78723404]
mean value: 0.7812379022579103
key: train_fscore
value: [0.96517413 0.96534653 0.96526055 0.96158612 0.95930949 0.96039604
0.96049383 0.96039604 0.96424168 0.96029777]
mean value: 0.9622502175860964
key: test_precision
value: [0.78431373 0.72 0.78571429 0.74509804 0.7254902 0.69811321
0.7173913 0.69230769 0.74509804 0.74 ]
mean value: 0.7353526489916974
key: train_precision
value: [0.94403893 0.93975904 0.94188862 0.93719807 0.93285372 0.93719807
0.93509615 0.93493976 0.9354067 0.937046 ]
mean value: 0.9375425054021276
key: test_recall
value: [0.90909091 0.81818182 0.75 0.86363636 0.86046512 0.86046512
0.76744186 0.81818182 0.86363636 0.84090909]
mean value: 0.835200845665962
key: train_recall
value: [0.98727735 0.99236641 0.98982188 0.98727735 0.98730964 0.98477157
0.98730964 0.98727735 0.99491094 0.98473282]
mean value: 0.9883054985081567
key: test_roc_auc
value: [0.82954545 0.75 0.77272727 0.78409091 0.77114165 0.74841438
0.73599366 0.7230444 0.78065539 0.76929175]
mean value: 0.7664904862579281
key: train_roc_auc
value: [0.96437659 0.96437659 0.96437659 0.9605598 0.95803141 0.95930691
0.95930368 0.95937472 0.96319151 0.95937149]
mean value: 0.9612269280944447
key: test_jcc
value: [0.72727273 0.62068966 0.62264151 0.66666667 0.64912281 0.62711864
0.58928571 0.6 0.66666667 0.64912281]
mean value: 0.6418587197601036
key: train_jcc
value: [0.93269231 0.93301435 0.93285372 0.92601432 0.92180095 0.92380952
0.9239905 0.92380952 0.93095238 0.92362768]
mean value: 0.927256525881002
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.08939147 1.07237315 1.0702858 1.06541538 1.07314181 1.07619619
1.07927656 1.07755518 1.0714097 1.0669136 ]
mean value: 1.074195885658264
key: score_time
value: [0.01029015 0.00958037 0.01048279 0.00957704 0.00974631 0.00978112
0.00978923 0.00957084 0.00987887 0.00948596]
mean value: 0.009818267822265626
key: test_mcc
value: [0.88659264 0.84639167 0.81818182 0.84287052 0.81702814 0.86303555
0.93329922 0.79480784 0.81702814 0.83923862]
mean value: 0.8458474162871303
key: train_mcc
value: [0.9567461 0.9567461 0.94912171 0.96437659 0.96443403 0.95695029
0.97207422 0.96696611 0.94918593 0.95426922]
mean value: 0.959087029355719
key: test_accuracy
value: [0.94318182 0.92045455 0.90909091 0.92045455 0.90804598 0.93103448
0.96551724 0.89655172 0.90804598 0.91954023]
mean value: 0.9221917450365726
key: train_accuracy
value: [0.9783715 0.9783715 0.97455471 0.9821883 0.98221093 0.97839898
0.98602287 0.98348158 0.97458704 0.97712834]
mean value: 0.9795315738252972
key: test_fscore
value: [0.94252874 0.92473118 0.90909091 0.92307692 0.90909091 0.93181818
0.96629213 0.9010989 0.90697674 0.92134831]
mean value: 0.9236052936227956
key: train_fscore
value: [0.97834395 0.97834395 0.9744898 0.9821883 0.98227848 0.97823303
0.98598726 0.98343949 0.9744898 0.97715736]
mean value: 0.979495141267347
key: test_precision
value: [0.95348837 0.87755102 0.90909091 0.89361702 0.88888889 0.91111111
0.93478261 0.87234043 0.92857143 0.91111111]
mean value: 0.9080552896778799
key: train_precision
value: [0.97959184 0.97959184 0.9769821 0.9821883 0.97979798 0.9870801
0.98976982 0.98469388 0.9769821 0.97468354]
mean value: 0.9811361488992022
key: test_recall
value: [0.93181818 0.97727273 0.90909091 0.95454545 0.93023256 0.95348837
1. 0.93181818 0.88636364 0.93181818]
mean value: 0.9406448202959831
key: train_recall
value: [0.97709924 0.97709924 0.97201018 0.9821883 0.98477157 0.96954315
0.9822335 0.9821883 0.97201018 0.97964377]
mean value: 0.977878740910089
key: test_roc_auc
value: [0.94318182 0.92045455 0.90909091 0.92045455 0.9082981 0.93128964
0.96590909 0.89614165 0.9082981 0.91939746]
mean value: 0.9222515856236786
key: train_roc_auc
value: [0.9783715 0.9783715 0.97455471 0.9821883 0.98220767 0.97841025
0.98602769 0.98347993 0.97458377 0.97713153]
mean value: 0.9795326849304452
key: test_jcc
value: [0.89130435 0.86 0.83333333 0.85714286 0.83333333 0.87234043
0.93478261 0.82 0.82978723 0.85416667]
mean value: 0.8586190806572398
key: train_jcc
value: [0.95760599 0.95760599 0.95024876 0.965 0.96517413 0.95739348
0.97236181 0.96741855 0.95024876 0.95533499]
mean value: 0.9598392438579324
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03878379 0.03929353 0.03901601 0.04018807 0.04464102 0.04195595
0.03913212 0.03936505 0.04722381 0.04217124]
mean value: 0.04117705821990967
key: score_time
value: [0.01283026 0.01283789 0.01293039 0.01398873 0.01305914 0.01290655
0.01287484 0.01305056 0.01294804 0.01292896]
mean value: 0.01303553581237793
key: test_mcc
value: [0.14322297 0.12598816 0.10910895 0.03750293 0.15546399 0.04655125
0.21701954 0.2497872 0.15100772 0.19262997]
mean value: 0.14282826822070938
key: train_mcc
value: [0.24351425 0.24932341 0.24643203 0.25784831 0.2316976 0.24947191
0.23773786 0.22804979 0.25734636 0.23413734]
mean value: 0.24355588486096966
key: test_accuracy
value: [0.54545455 0.53409091 0.52272727 0.51136364 0.54022989 0.50574713
0.54022989 0.56321839 0.54022989 0.56321839]
mean value: 0.5366509926854754
key: train_accuracy
value: [0.55597964 0.55852417 0.55725191 0.56234097 0.55146125 0.55908513
0.55400254 0.54891995 0.56162643 0.55146125]
mean value: 0.5560653235949317
key: test_fscore
value: [0.67213115 0.672 0.671875 0.6504065 0.67213115 0.656
0.68253968 0.6984127 0.68253968 0.68852459]
mean value: 0.6746560452803005
key: train_fscore
value: [0.69251101 0.69373345 0.69312169 0.69557522 0.69062226 0.69427313
0.69183494 0.68886941 0.69496021 0.69007902]
mean value: 0.6925580352130288
key: test_precision
value: [0.52564103 0.51851852 0.51190476 0.50632911 0.51898734 0.5
0.51807229 0.53658537 0.52439024 0.53846154]
mean value: 0.5198890199134771
key: train_precision
value: [0.5296496 0.53108108 0.53036437 0.53324288 0.52744311 0.5317139
0.52885906 0.52540107 0.53252033 0.52680965]
mean value: 0.5297085038255003
key: test_recall
value: [0.93181818 0.95454545 0.97727273 0.90909091 0.95348837 0.95348837
1. 1. 0.97727273 0.95454545]
mean value: 0.9611522198731501
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.54545455 0.53409091 0.52272727 0.51136364 0.544926 0.5108351
0.54545455 0.55813953 0.53514799 0.55866808]
mean value: 0.5366807610993658
key: train_roc_auc
value: [0.55597964 0.55852417 0.55725191 0.56234097 0.55089059 0.55852417
0.55343511 0.54949239 0.56218274 0.55203046]
mean value: 0.5560652148641841
key: test_jcc
value: [0.50617284 0.5060241 0.50588235 0.48192771 0.50617284 0.48809524
0.51807229 0.53658537 0.51807229 0.525 ]
mean value: 0.5092005021444588
key: train_jcc
value: [0.5296496 0.53108108 0.53036437 0.53324288 0.52744311 0.5317139
0.52885906 0.52540107 0.53252033 0.52680965]
mean value: 0.5297085038255003
MCC on Blind test: 0.06
Accuracy on Blind test: 0.45
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01776242 0.01769638 0.01799273 0.04317951 0.04338765 0.04285359
0.03382993 0.02006269 0.01762819 0.04450798]
mean value: 0.029890108108520507
key: score_time
value: [0.01360846 0.012321 0.01231194 0.01928067 0.01912475 0.01911569
0.01230025 0.01963973 0.0123415 0.01916599]
mean value: 0.01592099666595459
key: test_mcc
value: [0.64236405 0.62155249 0.47838597 0.57188626 0.65696218 0.54295079
0.67038474 0.62173301 0.58655447 0.65539112]
mean value: 0.6048165102926117
key: train_mcc
value: [0.73960469 0.73336325 0.72349182 0.73847379 0.73094792 0.7156377
0.72143309 0.74631504 0.73645742 0.72660979]
mean value: 0.7312334508927151
key: test_accuracy
value: [0.81818182 0.80681818 0.73863636 0.78409091 0.82758621 0.77011494
0.82758621 0.8045977 0.79310345 0.82758621]
mean value: 0.799830198537095
key: train_accuracy
value: [0.86768448 0.86513995 0.86005089 0.86768448 0.86404066 0.85641677
0.85895807 0.87166455 0.86658196 0.86149936]
mean value: 0.8639721168737532
key: test_fscore
value: [0.82978723 0.82105263 0.74725275 0.79569892 0.83146067 0.77777778
0.84210526 0.82474227 0.8 0.82758621]
mean value: 0.8097463727636196
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.87439614 0.87104623 0.86650485 0.87347932 0.86998785 0.86269745
0.86577993 0.87697929 0.87241798 0.86787879]
mean value: 0.870116782663218
key: test_precision
value: [0.78 0.76470588 0.72340426 0.75510204 0.80434783 0.74468085
0.76923077 0.75471698 0.7826087 0.8372093 ]
mean value: 0.7716006603979804
key: train_precision
value: [0.83218391 0.83449883 0.82830626 0.83682984 0.83449883 0.82750583
0.82678984 0.8411215 0.83488372 0.8287037 ]
mean value: 0.8325322264178692
key: test_recall
value: [0.88636364 0.88636364 0.77272727 0.84090909 0.86046512 0.81395349
0.93023256 0.90909091 0.81818182 0.81818182]
mean value: 0.8536469344608879
key: train_recall
value: [0.92111959 0.91094148 0.90839695 0.91348601 0.90862944 0.90101523
0.90862944 0.91603053 0.91348601 0.91094148]
mean value: 0.9112676147298536
key: test_roc_auc
value: [0.81818182 0.80681818 0.73863636 0.78409091 0.82795983 0.77061311
0.82875264 0.80338266 0.79281184 0.82769556]
mean value: 0.7998942917547569
key: train_roc_auc
value: [0.86768448 0.86513995 0.86005089 0.86768448 0.86398393 0.85636003
0.85889487 0.87172085 0.86664148 0.86156211]
mean value: 0.8639723072551375
key: test_jcc
value: [0.70909091 0.69642857 0.59649123 0.66071429 0.71153846 0.63636364
0.72727273 0.70175439 0.66666667 0.70588235]
mean value: 0.6812203225051522
key: train_jcc
value: [0.77682403 0.77155172 0.76445396 0.77537797 0.76989247 0.75854701
0.76332623 0.78091106 0.7737069 0.76659529]
mean value: 0.7701186645906976
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.33544517 0.34822321 0.32937121 0.32755113 0.33985925 0.4299016
0.33088613 0.3283422 0.32882071 0.38829613]
mean value: 0.3486696720123291
key: score_time
value: [0.01930285 0.01924634 0.01923442 0.01921296 0.01925468 0.01923203
0.01920033 0.01935935 0.01922727 0.01912498]
mean value: 0.019239521026611327
key: test_mcc
value: [0.64236405 0.61763716 0.47838597 0.57188626 0.65696218 0.54295079
0.67803941 0.64863047 0.58655447 0.60940803]
mean value: 0.6032818808066384
key: train_mcc
value: [0.73960469 0.75496449 0.72349182 0.73847379 0.73094792 0.7156377
0.75012172 0.77148345 0.73645742 0.73645742]
mean value: 0.7397640414438856
key: test_accuracy
value: [0.81818182 0.80681818 0.73863636 0.78409091 0.82758621 0.77011494
0.82758621 0.81609195 0.79310345 0.8045977 ]
mean value: 0.7986807732497387
key: train_accuracy
value: [0.86768448 0.87659033 0.86005089 0.86768448 0.86404066 0.85641677
0.8729352 0.88437103 0.86658196 0.86658196]
mean value: 0.868293775117931
key: test_fscore
value: [0.82978723 0.8172043 0.74725275 0.79569892 0.83146067 0.77777778
0.84536082 0.83673469 0.8 0.8045977 ]
mean value: 0.8085874878806077
key: train_fscore
value: [0.87439614 0.88068881 0.86650485 0.87347932 0.86998785 0.86269745
0.87951807 0.88888889 0.87241798 0.87241798]
mean value: 0.8740997340105042
key: test_precision
value: [0.78 0.7755102 0.72340426 0.75510204 0.80434783 0.74468085
0.75925926 0.75925926 0.7826087 0.81395349]
mean value: 0.769812587991068
key: train_precision
value: [0.83218391 0.85238095 0.82830626 0.83682984 0.83449883 0.82750583
0.83715596 0.85446009 0.83488372 0.83488372]
mean value: 0.8373089122822519
key: test_recall
value: [0.88636364 0.86363636 0.77272727 0.84090909 0.86046512 0.81395349
0.95348837 0.93181818 0.81818182 0.79545455]
mean value: 0.8536997885835095
key: train_recall
value: [0.92111959 0.91094148 0.90839695 0.91348601 0.90862944 0.90101523
0.92639594 0.92620865 0.91348601 0.91348601]
mean value: 0.9143165291070898
key: test_roc_auc
value: [0.81818182 0.80681818 0.73863636 0.78409091 0.82795983 0.77061311
0.82901691 0.8147463 0.79281184 0.80470402]
mean value: 0.7987579281183932
key: train_roc_auc
value: [0.86768448 0.87659033 0.86005089 0.86768448 0.86398393 0.85636003
0.87286718 0.88442412 0.86664148 0.86664148]
mean value: 0.8682928404438072
key: test_jcc
value: [0.70909091 0.69090909 0.59649123 0.66071429 0.71153846 0.63636364
0.73214286 0.71929825 0.66666667 0.67307692]
mean value: 0.6796292304187042
key: train_jcc
value: [0.77682403 0.78681319 0.76445396 0.77537797 0.76989247 0.75854701
0.78494624 0.8 0.7737069 0.7737069 ]
mean value: 0.7764268663694349
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04968023 0.03789711 0.07179022 0.05948281 0.05385971 0.05733061
0.03771996 0.03850865 0.0391221 0.03841591]
mean value: 0.04838073253631592
key: score_time
value: [0.01359701 0.01272964 0.02741122 0.01486683 0.01206851 0.01363039
0.0127759 0.0128696 0.0128572 0.01296473]
mean value: 0.014577102661132813
key: test_mcc
value: [0.44539933 0.47082362 0.62622429 0.40644851 0.5809475 0.59404013
0.5957539 0.61895161 0.49193548 0.52371369]
mean value: 0.5354238070669484
key: train_mcc
value: [0.70137886 0.69003127 0.68667602 0.69273127 0.65895585 0.66630992
0.67291968 0.66568726 0.65836157 0.69060208]
mean value: 0.678365378762692
key: test_accuracy
value: [0.71875 0.734375 0.8125 0.703125 0.78125 0.796875
0.79365079 0.80952381 0.74603175 0.76190476]
mean value: 0.7657986111111111
key: train_accuracy
value: [0.84965035 0.84440559 0.84265734 0.84615385 0.82867133 0.83216783
0.83595113 0.83246073 0.82897033 0.84467714]
mean value: 0.8385765630530029
key: test_fscore
value: [0.74285714 0.72131148 0.80645161 0.70769231 0.80555556 0.8
0.80597015 0.80645161 0.75 0.76923077]
mean value: 0.7715520625805794
key: train_fscore
value: [0.85521886 0.84889643 0.84745763 0.84879725 0.83445946 0.83838384
0.84067797 0.83673469 0.83161512 0.84889643]
mean value: 0.8431137680564013
key: test_precision
value: [0.68421053 0.75862069 0.83333333 0.6969697 0.725 0.78787879
0.75 0.80645161 0.75 0.75757576]
mean value: 0.7550040404631764
key: train_precision
value: [0.82467532 0.82508251 0.82236842 0.83445946 0.80718954 0.80844156
0.81848185 0.81727575 0.81756757 0.82508251]
mean value: 0.8200624485874977
key: test_recall
value: [0.8125 0.6875 0.78125 0.71875 0.90625 0.8125
0.87096774 0.80645161 0.75 0.78125 ]
mean value: 0.792741935483871
key: train_recall
value: [0.88811189 0.87412587 0.87412587 0.86363636 0.86363636 0.87062937
0.8641115 0.85714286 0.84615385 0.87412587]
mean value: 0.8675799809946152
key: test_roc_auc
value: [0.71875 0.734375 0.8125 0.703125 0.78125 0.796875
0.79485887 0.80947581 0.74596774 0.76159274]
mean value: 0.7658770161290323
key: train_roc_auc
value: [0.84965035 0.84440559 0.84265734 0.84615385 0.82867133 0.83216783
0.8359019 0.83241758 0.82900027 0.84472844]
mean value: 0.8385754489413026
key: test_jcc
value: [0.59090909 0.56410256 0.67567568 0.54761905 0.6744186 0.66666667
0.675 0.67567568 0.6 0.625 ]
mean value: 0.6295067325299883
key: train_jcc
value: [0.74705882 0.73746313 0.73529412 0.73731343 0.71594203 0.72173913
0.7251462 0.71929825 0.71176471 0.73746313]
mean value: 0.7288482937446694
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.81775093 0.97031665 0.86882544 0.84967637 1.06103253 1.10017157
1.02854228 0.88756537 0.91167998 0.95300126]
mean value: 0.9448562383651733
key: score_time
value: [0.01471949 0.01237798 0.01467729 0.01467299 0.01483011 0.01496291
0.01218915 0.01475215 0.01921082 0.01462936]
mean value: 0.014702224731445312
key: test_mcc
value: [0.4163332 0.50097943 0.625 0.53150959 0.474579 0.65657067
0.62475802 0.61982085 0.5570134 0.52371369]
mean value: 0.5530277845963453
key: train_mcc
value: [0.73177639 0.74552581 0.72871741 0.72814564 0.80829038 0.79480019
0.74257475 0.79849292 0.79827133 0.73925674]
mean value: 0.7615851555815423
key: test_accuracy
value: [0.703125 0.75 0.8125 0.765625 0.734375 0.828125
0.80952381 0.80952381 0.77777778 0.76190476]
mean value: 0.7752480158730158
key: train_accuracy
value: [0.86538462 0.87237762 0.86363636 0.86363636 0.90384615 0.89685315
0.87085515 0.89877836 0.89877836 0.86910995]
mean value: 0.8803256080742992
key: test_fscore
value: [0.73239437 0.74193548 0.8125 0.76923077 0.75362319 0.82539683
0.81818182 0.8 0.77419355 0.76923077]
mean value: 0.7796686768901226
key: train_fscore
value: [0.86882453 0.87521368 0.86779661 0.8668942 0.90566038 0.89948893
0.87414966 0.90136054 0.90068493 0.87223169]
mean value: 0.8832305141086446
key: test_precision
value: [0.66666667 0.76666667 0.8125 0.75757576 0.7027027 0.83870968
0.77142857 0.82758621 0.8 0.75757576]
mean value: 0.7701412006932029
key: train_precision
value: [0.84717608 0.85618729 0.84210526 0.84666667 0.88888889 0.87707641
0.8538206 0.88039867 0.88255034 0.85049834]
mean value: 0.8625368544921593
key: test_recall
value: [0.8125 0.71875 0.8125 0.78125 0.8125 0.8125
0.87096774 0.77419355 0.75 0.78125 ]
mean value: 0.792641129032258
key: train_recall
value: [0.89160839 0.8951049 0.8951049 0.88811189 0.92307692 0.92307692
0.89547038 0.92334495 0.91958042 0.8951049 ]
mean value: 0.9049584561779683
key: test_roc_auc
value: [0.703125 0.75 0.8125 0.765625 0.734375 0.828125
0.81048387 0.80897177 0.77822581 0.76159274]
mean value: 0.7753024193548387
key: train_roc_auc
value: [0.86538462 0.87237762 0.86363636 0.86363636 0.90384615 0.89685315
0.87081211 0.89873541 0.8988146 0.86915524]
mean value: 0.8803251626422358
key: test_jcc
value: [0.57777778 0.58974359 0.68421053 0.625 0.60465116 0.7027027
0.69230769 0.66666667 0.63157895 0.625 ]
mean value: 0.6399639065673337
key: train_jcc
value: [0.76807229 0.7781155 0.76646707 0.76506024 0.82758621 0.81733746
0.77643505 0.82043344 0.81931464 0.7734139 ]
mean value: 0.7912235786580607
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01438069 0.0107131 0.01069641 0.01035666 0.0103178 0.01025701
0.01074266 0.01153016 0.01023555 0.01032662]
mean value: 0.01095566749572754
key: score_time
value: [0.01253676 0.00952673 0.00941396 0.00904965 0.00897145 0.00896692
0.00892973 0.00899005 0.00900006 0.00900984]
mean value: 0.009439516067504882
key: test_mcc
value: [0.46897905 0.46056619 0.35820928 0.25048972 0.19364917 0.28823068
0.3393548 0.33367758 0.49344122 0.4089525 ]
mean value: 0.35955501764481224
key: train_mcc
value: [0.37965772 0.4185499 0.39072156 0.46245528 0.41614768 0.36263664
0.37967833 0.37573579 0.4409701 0.39252541]
mean value: 0.40190783982214684
key: test_accuracy
value: [0.734375 0.71875 0.671875 0.625 0.59375 0.640625
0.66666667 0.66666667 0.74603175 0.6984127 ]
mean value: 0.6762152777777778
key: train_accuracy
value: [0.68531469 0.70804196 0.69405594 0.73076923 0.7027972 0.67657343
0.68760908 0.68237347 0.71553229 0.69284468]
mean value: 0.6975911958896253
key: test_fscore
value: [0.73015873 0.66666667 0.61818182 0.63636364 0.53571429 0.59649123
0.61818182 0.6440678 0.74193548 0.66666667]
mean value: 0.6454428130484935
key: train_fscore
value: [0.64705882 0.69131238 0.67532468 0.73898305 0.66535433 0.63510848
0.66290019 0.64031621 0.68101761 0.66023166]
mean value: 0.6697607412759368
key: test_precision
value: [0.74193548 0.81818182 0.73913043 0.61764706 0.625 0.68
0.70833333 0.67857143 0.76666667 0.76 ]
mean value: 0.7135466224230352
key: train_precision
value: [0.73660714 0.73333333 0.71936759 0.71710526 0.76126126 0.72850679
0.72131148 0.73972603 0.77333333 0.73706897]
mean value: 0.7367621178530426
key: test_recall
value: [0.71875 0.5625 0.53125 0.65625 0.46875 0.53125
0.5483871 0.61290323 0.71875 0.59375 ]
mean value: 0.5942540322580645
key: train_recall
value: [0.57692308 0.65384615 0.63636364 0.76223776 0.59090909 0.56293706
0.61324042 0.56445993 0.60839161 0.5979021 ]
mean value: 0.6167210837942545
key: test_roc_auc
value: [0.734375 0.71875 0.671875 0.625 0.59375 0.640625
0.66481855 0.66582661 0.74647177 0.70010081]
mean value: 0.6761592741935484
key: train_roc_auc
value: [0.68531469 0.70804196 0.69405594 0.73076923 0.7027972 0.67657343
0.68773909 0.68257962 0.71534563 0.69267927]
mean value: 0.6975896055164348
key: test_jcc
value: [0.575 0.5 0.44736842 0.46666667 0.36585366 0.425
0.44736842 0.475 0.58974359 0.5 ]
mean value: 0.4792000757052105
key: train_jcc
value: [0.47826087 0.52824859 0.50980392 0.58602151 0.49852507 0.46531792
0.49577465 0.47093023 0.51632047 0.49279539]
mean value: 0.5041998621174171
MCC on Blind test: 0.53
Accuracy on Blind test: 0.77
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01077843 0.01050925 0.01047873 0.01048326 0.0104928 0.01047349
0.01060224 0.01083875 0.01082993 0.01076794]
mean value: 0.010625481605529785
key: score_time
value: [0.00895715 0.00905252 0.00902224 0.00900888 0.00898409 0.0090363
0.00899148 0.0090642 0.00913739 0.00918102]
mean value: 0.009043526649475098
key: test_mcc
value: [0.31814238 0.34527065 0.44539933 0.34391797 0.42333825 0.62622429
0.36629686 0.42842742 0.33367758 0.49193548]
mean value: 0.4122630210919585
key: train_mcc
value: [0.53152212 0.51229647 0.48732947 0.48309663 0.4795166 0.51083262
0.46627502 0.46655817 0.5205467 0.5054141 ]
mean value: 0.4963387918137142
key: test_accuracy
value: [0.65625 0.671875 0.71875 0.671875 0.703125 0.8125
0.68253968 0.71428571 0.66666667 0.74603175]
mean value: 0.704389880952381
key: train_accuracy
value: [0.76398601 0.75524476 0.74300699 0.74125874 0.73951049 0.7534965
0.73298429 0.73298429 0.7591623 0.7521815 ]
mean value: 0.7473815887428453
key: test_fscore
value: [0.68571429 0.6557377 0.68965517 0.67692308 0.73972603 0.81818182
0.6875 0.70967742 0.68656716 0.75 ]
mean value: 0.709968266908221
key: train_fscore
value: [0.7768595 0.76510067 0.75210793 0.74744027 0.74529915 0.76771005
0.73846154 0.74023769 0.76923077 0.75932203]
mean value: 0.7561769601426575
key: test_precision
value: [0.63157895 0.68965517 0.76923077 0.66666667 0.65853659 0.79411765
0.66666667 0.70967742 0.65714286 0.75 ]
mean value: 0.6993272731268689
key: train_precision
value: [0.73667712 0.73548387 0.72638436 0.73 0.72909699 0.7258567
0.72483221 0.7218543 0.73717949 0.73684211]
mean value: 0.7304207151405426
key: test_recall
value: [0.75 0.625 0.625 0.6875 0.84375 0.84375
0.70967742 0.70967742 0.71875 0.75 ]
mean value: 0.7263104838709677
key: train_recall
value: [0.82167832 0.7972028 0.77972028 0.76573427 0.76223776 0.81468531
0.75261324 0.75958188 0.8041958 0.78321678]
mean value: 0.7840866450622548
key: test_roc_auc
value: [0.65625 0.671875 0.71875 0.671875 0.703125 0.8125
0.68296371 0.71421371 0.66582661 0.74596774]
mean value: 0.7043346774193548
key: train_roc_auc
value: [0.76398601 0.75524476 0.74300699 0.74125874 0.73951049 0.7534965
0.73294998 0.73293779 0.75924076 0.75223557]
mean value: 0.7473867595818815
key: test_jcc
value: [0.52173913 0.48780488 0.52631579 0.51162791 0.58695652 0.69230769
0.52380952 0.55 0.52272727 0.6 ]
mean value: 0.5523288715517611
key: train_jcc
value: [0.63513514 0.61956522 0.6027027 0.59673025 0.59400545 0.62299465
0.58536585 0.58760108 0.625 0.61202186]
mean value: 0.6081122192207598
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01007533 0.01112366 0.01109052 0.01143003 0.01140022 0.01098752
0.01147366 0.01116109 0.01144481 0.01098204]
mean value: 0.011116886138916015
key: score_time
value: [0.01728034 0.01390123 0.01371455 0.01369023 0.01358032 0.01316977
0.01375914 0.01355004 0.01325154 0.0130403 ]
mean value: 0.013893747329711914
key: test_mcc
value: [0.28249417 0.18786729 0.28138743 0.31311215 0.32897585 0.3480246
0.42986904 0.33367758 0.01513623 0.23761484]
mean value: 0.2758159171857665
key: train_mcc
value: [0.57695482 0.55611065 0.54898125 0.56711343 0.57342657 0.54204088
0.58510016 0.57416973 0.58464706 0.58468061]
mean value: 0.5693225176661459
key: test_accuracy
value: [0.640625 0.59375 0.640625 0.65625 0.65625 0.671875
0.71428571 0.66666667 0.50793651 0.61904762]
mean value: 0.6367311507936508
key: train_accuracy
value: [0.78846154 0.77797203 0.77447552 0.78321678 0.78671329 0.77097902
0.79232112 0.78708551 0.79232112 0.79232112]
mean value: 0.7845867047437728
key: test_fscore
value: [0.65671642 0.58064516 0.63492063 0.64516129 0.7027027 0.6440678
0.71875 0.6440678 0.52307692 0.63636364]
mean value: 0.6386472359807587
key: train_fscore
value: [0.78956522 0.77522124 0.77328647 0.7883959 0.78671329 0.7729636
0.78863233 0.78745645 0.79232112 0.79304348]
mean value: 0.7847599087821961
key: test_precision
value: [0.62857143 0.6 0.64516129 0.66666667 0.61904762 0.7037037
0.6969697 0.67857143 0.51515152 0.61764706]
mean value: 0.6371490407828169
key: train_precision
value: [0.78546713 0.78494624 0.77738516 0.77 0.78671329 0.76632302
0.80434783 0.78745645 0.79094077 0.78892734]
mean value: 0.784250720863634
key: test_recall
value: [0.6875 0.5625 0.625 0.625 0.8125 0.59375
0.74193548 0.61290323 0.53125 0.65625 ]
mean value: 0.644858870967742
key: train_recall
value: [0.79370629 0.76573427 0.76923077 0.80769231 0.78671329 0.77972028
0.77351916 0.78745645 0.79370629 0.7972028 ]
mean value: 0.7854681903462392
key: test_roc_auc
value: [0.640625 0.59375 0.640625 0.65625 0.65625 0.671875
0.71471774 0.66582661 0.50756048 0.61844758]
mean value: 0.6365927419354839
key: train_roc_auc
value: [0.78846154 0.77797203 0.77447552 0.78321678 0.78671329 0.77097902
0.79235399 0.78708487 0.79232353 0.79232962]
mean value: 0.7845910187373603
key: test_jcc
value: [0.48888889 0.40909091 0.46511628 0.47619048 0.54166667 0.475
0.56097561 0.475 0.35416667 0.46666667]
mean value: 0.4712762162996139
key: train_jcc
value: [0.65229885 0.63294798 0.63037249 0.65070423 0.64841499 0.6299435
0.65102639 0.64942529 0.65606936 0.65706052]
mean value: 0.6458263597269788
MCC on Blind test: 0.29
Accuracy on Blind test: 0.66
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.03415418 0.02976036 0.03232479 0.02934623 0.02964926 0.02974415
0.02997589 0.02972007 0.02973104 0.02964139]
mean value: 0.03040473461151123
key: score_time
value: [0.01561642 0.01400876 0.01577091 0.01404595 0.01406598 0.01601458
0.01414895 0.01402998 0.01415586 0.01414967]
mean value: 0.014600706100463868
key: test_mcc
value: [0.32274861 0.40644851 0.59404013 0.44539933 0.52636136 0.51639778
0.4969666 0.56086231 0.42842742 0.55611985]
mean value: 0.48537719052534
key: train_mcc
value: [0.66001142 0.66528266 0.6179077 0.63177564 0.62412608 0.62471669
0.64145684 0.65479702 0.63778991 0.64216428]
mean value: 0.640002823010731
key: test_accuracy
value: [0.65625 0.703125 0.796875 0.71875 0.75 0.75
0.74603175 0.77777778 0.71428571 0.77777778]
mean value: 0.7390873015873016
key: train_accuracy
value: [0.82692308 0.83041958 0.8041958 0.81293706 0.80944056 0.80944056
0.81849913 0.82373473 0.81849913 0.81849913]
mean value: 0.8172588755049488
key: test_fscore
value: [0.69444444 0.6984127 0.79365079 0.74285714 0.78378378 0.77777778
0.75757576 0.78787879 0.71875 0.78787879]
mean value: 0.7543009974259974
key: train_fscore
value: [0.83797054 0.83966942 0.81993569 0.82487725 0.82101806 0.82160393
0.82894737 0.8363047 0.8225256 0.82894737]
mean value: 0.828179992797138
key: test_precision
value: [0.625 0.70967742 0.80645161 0.68421053 0.69047619 0.7
0.71428571 0.74285714 0.71875 0.76470588]
mean value: 0.7156414488545843
key: train_precision
value: [0.78769231 0.79623824 0.75892857 0.77538462 0.77399381 0.77230769
0.78504673 0.78181818 0.80333333 0.7826087 ]
mean value: 0.7817352179152481
key: test_recall
value: [0.78125 0.6875 0.78125 0.8125 0.90625 0.875
0.80645161 0.83870968 0.71875 0.8125 ]
mean value: 0.802016129032258
key: train_recall
value: [0.8951049 0.88811189 0.89160839 0.88111888 0.87412587 0.87762238
0.87804878 0.8989547 0.84265734 0.88111888]
mean value: 0.880847201578909
key: test_roc_auc
value: [0.65625 0.703125 0.796875 0.71875 0.75 0.75
0.74697581 0.77872984 0.71421371 0.77721774]
mean value: 0.7392137096774194
key: train_roc_auc
value: [0.82692308 0.83041958 0.8041958 0.81293706 0.80944056 0.80944056
0.81839502 0.82360323 0.81854121 0.81860822]
mean value: 0.8172504324943349
key: test_jcc
value: [0.53191489 0.53658537 0.65789474 0.59090909 0.64444444 0.63636364
0.6097561 0.65 0.56097561 0.65 ]
mean value: 0.606884387534703
key: train_jcc
value: [0.72112676 0.72364672 0.69482289 0.70194986 0.69637883 0.69722222
0.70786517 0.71866295 0.69855072 0.70786517]
mean value: 0.7068091299886077
MCC on Blind test: 0.48
Accuracy on Blind test: 0.74
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.13041496 2.18040395 2.15946102 2.10226822 2.21306634 2.20666718
1.96261048 2.13868785 1.92572284 2.1737442 ]
mean value: 2.1193047046661375
key: score_time
value: [0.01501131 0.01783228 0.02522731 0.01474667 0.01502776 0.01275539
0.01284885 0.01510572 0.01248312 0.01251149]
mean value: 0.01535499095916748
key: test_mcc
value: [0.28249417 0.25 0.68884672 0.46897905 0.48038446 0.438357
0.66853948 0.58770161 0.40327957 0.52371369]
mean value: 0.4792295749104151
key: train_mcc
value: [0.97212305 0.9652474 0.95804196 0.96159136 0.95862812 0.95142124
0.92401632 0.95122576 0.93368746 0.96509588]
mean value: 0.9541078553036163
key: test_accuracy
value: [0.640625 0.625 0.84375 0.734375 0.734375 0.71875
0.82539683 0.79365079 0.6984127 0.76190476]
mean value: 0.737624007936508
key: train_accuracy
value: [0.98601399 0.98251748 0.97902098 0.98076923 0.97902098 0.97552448
0.96160558 0.97556719 0.96684119 0.98254799]
mean value: 0.9769429087491914
key: test_fscore
value: [0.65671642 0.625 0.83870968 0.73015873 0.76056338 0.70967742
0.84057971 0.79365079 0.6779661 0.76923077]
mean value: 0.7402252999846467
key: train_fscore
value: [0.98611111 0.98269896 0.97902098 0.98086957 0.97938144 0.97586207
0.96245734 0.97577855 0.96672504 0.98251748]
mean value: 0.9771422540448765
key: test_precision
value: [0.62857143 0.625 0.86666667 0.74193548 0.69230769 0.73333333
0.76315789 0.78125 0.74074074 0.75757576]
mean value: 0.7330538997803429
key: train_precision
value: [0.97931034 0.97260274 0.97902098 0.97577855 0.96283784 0.96258503
0.94314381 0.96907216 0.96842105 0.98251748]
mean value: 0.9695289994945384
key: test_recall
value: [0.6875 0.625 0.8125 0.71875 0.84375 0.6875
0.93548387 0.80645161 0.625 0.78125 ]
mean value: 0.7523185483870968
key: train_recall
value: [0.99300699 0.99300699 0.97902098 0.98601399 0.9965035 0.98951049
0.9825784 0.9825784 0.96503497 0.98251748]
mean value: 0.9849772179040471
key: test_roc_auc
value: [0.640625 0.625 0.84375 0.734375 0.734375 0.71875
0.82711694 0.79385081 0.69959677 0.76159274]
mean value: 0.7379032258064516
key: train_roc_auc
value: [0.98601399 0.98251748 0.97902098 0.98076923 0.97902098 0.97552448
0.96156892 0.97555493 0.96683804 0.98254794]
mean value: 0.9769376964498916
key: test_jcc
value: [0.48888889 0.45454545 0.72222222 0.575 0.61363636 0.55
0.725 0.65789474 0.51282051 0.625 ]
mean value: 0.5925008178955548
key: train_jcc
value: [0.97260274 0.96598639 0.95890411 0.96245734 0.95959596 0.95286195
0.92763158 0.9527027 0.93559322 0.96563574]
mean value: 0.9553971735035433
MCC on Blind test: 0.58
Accuracy on Blind test: 0.79
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.04582191 0.03966475 0.03780389 0.0354073 0.03452015 0.02935576
0.03461981 0.0351541 0.04046655 0.03755069]
mean value: 0.03703649044036865
key: score_time
value: [0.0110054 0.00905657 0.00910783 0.0090065 0.00911236 0.00901842
0.00909066 0.00909805 0.00918746 0.0090847 ]
mean value: 0.009276795387268066
key: test_mcc
value: [0.75146915 0.46897905 0.71910121 0.790965 0.56360186 0.75146915
0.65120968 0.77822581 0.49241885 0.52419355]
mean value: 0.6491633296175138
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.875 0.734375 0.859375 0.890625 0.78125 0.875
0.82539683 0.88888889 0.74603175 0.76190476]
mean value: 0.8237847222222222
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.87878788 0.73015873 0.85714286 0.88135593 0.77419355 0.87096774
0.82539683 0.88888889 0.75757576 0.76190476]
mean value: 0.8226372922381671
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85294118 0.74193548 0.87096774 0.96296296 0.8 0.9
0.8125 0.875 0.73529412 0.77419355]
mean value: 0.8325795031274158
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90625 0.71875 0.84375 0.8125 0.75 0.84375
0.83870968 0.90322581 0.78125 0.75 ]
mean value: 0.8148185483870968
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.875 0.734375 0.859375 0.890625 0.78125 0.875
0.82560484 0.8891129 0.74546371 0.76209677]
mean value: 0.8237903225806452
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.78378378 0.575 0.75 0.78787879 0.63157895 0.77142857
0.7027027 0.8 0.6097561 0.61538462]
mean value: 0.7027513506107858
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.15573025 0.14470005 0.14443851 0.14479327 0.14609981 0.14602733
0.14412546 0.14521909 0.14652395 0.14871454]
mean value: 0.146637225151062
key: score_time
value: [0.01821351 0.01839828 0.01836967 0.01827264 0.01836801 0.01833248
0.01864052 0.01838899 0.01829696 0.02000189]
mean value: 0.018528294563293458
key: test_mcc
value: [0.46897905 0.4113018 0.5336001 0.5 0.42333825 0.62994079
0.51058887 0.49193548 0.42986904 0.52371369]
mean value: 0.49232670574184606
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.734375 0.703125 0.765625 0.75 0.703125 0.8125
0.74603175 0.74603175 0.71428571 0.76190476]
mean value: 0.7437003968253968
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.73846154 0.6779661 0.75409836 0.75 0.73972603 0.82352941
0.77142857 0.74193548 0.70967742 0.76923077]
mean value: 0.7476053683859305
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72727273 0.74074074 0.79310345 0.75 0.65853659 0.77777778
0.69230769 0.74193548 0.73333333 0.75757576]
mean value: 0.7372583546520712
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.625 0.71875 0.75 0.84375 0.875
0.87096774 0.74193548 0.6875 0.78125 ]
mean value: 0.7644153225806452
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.734375 0.703125 0.765625 0.75 0.703125 0.8125
0.74798387 0.74596774 0.71471774 0.76159274]
mean value: 0.7439012096774194
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.58536585 0.51282051 0.60526316 0.6 0.58695652 0.7
0.62790698 0.58974359 0.55 0.625 ]
mean value: 0.5983056612600692
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01084328 0.01262331 0.01136661 0.01074314 0.01116204 0.01096058
0.01086903 0.0107832 0.01119852 0.01064134]
mean value: 0.01111910343170166
key: score_time
value: [0.0092206 0.01025367 0.00897503 0.00894499 0.00896931 0.00879312
0.00883961 0.00905681 0.00896764 0.00889587]
mean value: 0.009091663360595702
key: test_mcc
value: [0.19088543 0.28138743 0.25451391 0.2847474 0.13159034 0.34391797
0.26942496 0.21080523 0.07809475 0.18338233]
mean value: 0.2228749746098926
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.59375 0.640625 0.625 0.640625 0.5625 0.671875
0.63492063 0.6031746 0.53968254 0.58730159]
mean value: 0.6099454365079365
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.62857143 0.64615385 0.5862069 0.61016949 0.62162162 0.67692308
0.62295082 0.62686567 0.56716418 0.65789474]
mean value: 0.6244521768607626
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.57894737 0.63636364 0.65384615 0.66666667 0.54761905 0.66666667
0.63333333 0.58333333 0.54285714 0.56818182]
mean value: 0.6077815167288851
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.6875 0.65625 0.53125 0.5625 0.71875 0.6875
0.61290323 0.67741935 0.59375 0.78125 ]
mean value: 0.6509072580645161
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.59375 0.640625 0.625 0.640625 0.5625 0.671875
0.63457661 0.60433468 0.53881048 0.58417339]
mean value: 0.6096270161290323
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.45833333 0.47727273 0.41463415 0.43902439 0.45098039 0.51162791
0.45238095 0.45652174 0.39583333 0.49019608]
mean value: 0.4546804999601126
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.42
Accuracy on Blind test: 0.71
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.13523126 2.13666153 2.17790389 2.1243844 2.14154196 2.14696455
2.11618638 2.12946486 2.18791819 2.1946516 ]
mean value: 2.14909086227417
key: score_time
value: [0.09532213 0.09471822 0.1032505 0.09645987 0.10374951 0.09507036
0.09460235 0.0944438 0.09846973 0.09542656]
mean value: 0.0971513032913208
key: test_mcc
value: [0.81409158 0.65657067 0.75146915 0.78163175 0.75592895 0.68884672
0.65419917 0.74772995 0.68245968 0.65821474]
mean value: 0.7191142340192237
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90625 0.828125 0.875 0.890625 0.875 0.84375
0.82539683 0.87301587 0.84126984 0.82539683]
mean value: 0.8583829365079365
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.83076923 0.87096774 0.89230769 0.88235294 0.83870968
0.83076923 0.875 0.84375 0.84057971]
mean value: 0.86142971336133
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.88235294 0.81818182 0.9 0.87878788 0.83333333 0.86666667
0.79411765 0.84848485 0.84375 0.78378378]
mean value: 0.8449458917473623
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9375 0.84375 0.84375 0.90625 0.9375 0.8125
0.87096774 0.90322581 0.84375 0.90625 ]
mean value: 0.8805443548387096
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90625 0.828125 0.875 0.890625 0.875 0.84375
0.82610887 0.8734879 0.84122984 0.82409274]
mean value: 0.858366935483871
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.71052632 0.77142857 0.80555556 0.78947368 0.72222222
0.71052632 0.77777778 0.72972973 0.725 ]
mean value: 0.7575573505836664
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.61
Accuracy on Blind test: 0.81
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [0.97181416 1.06746793 1.03298688 0.99328613 1.07194424 1.03769469
1.07363486 1.10688138 0.99139166 1.02051616]
mean value: 1.0367618083953858
key: score_time
value: [0.19423556 0.18731165 0.27005219 0.29324675 0.28291416 0.22762942
0.26725793 0.29875374 0.31482267 0.28389359]
mean value: 0.2620117664337158
key: test_mcc
value: [0.81409158 0.6875 0.78163175 0.78163175 0.78470603 0.75
0.72407013 0.71471774 0.71705182 0.68740835]
mean value: 0.744280914258336
key: train_mcc
value: [0.90989218 0.90653139 0.90626497 0.90604313 0.92418486 0.90964713
0.91004139 0.91658347 0.90621124 0.90262515]
mean value: 0.9098024895604151
key: test_accuracy
value: [0.90625 0.84375 0.890625 0.890625 0.890625 0.875
0.85714286 0.85714286 0.85714286 0.84126984]
mean value: 0.8709573412698413
key: train_accuracy
value: [0.95454545 0.9527972 0.9527972 0.9527972 0.96153846 0.95454545
0.95462478 0.95811518 0.95287958 0.95113438]
mean value: 0.9545774905722549
key: test_fscore
value: [0.90909091 0.84375 0.88888889 0.89230769 0.89552239 0.875
0.86567164 0.85714286 0.86567164 0.85294118]
mean value: 0.8745987195542726
key: train_fscore
value: [0.95547945 0.95384615 0.95368782 0.9535284 0.96245734 0.95532646
0.9556314 0.95876289 0.9535284 0.95172414]
mean value: 0.95539724483478
key: test_precision
value: [0.88235294 0.84375 0.90322581 0.87878788 0.85714286 0.875
0.80555556 0.84375 0.82857143 0.80555556]
mean value: 0.8523692023241359
key: train_precision /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
value: [0.93624161 0.93311037 0.93602694 0.93898305 0.94 0.93918919
0.93645485 0.94576271 0.93898305 0.93877551]
mean value: 0.9383527277109088
key: test_recall
value: [0.9375 0.84375 0.875 0.90625 0.9375 0.875
0.93548387 0.87096774 0.90625 0.90625 ]
mean value: 0.8993951612903226
key: train_recall
value: [0.97552448 0.97552448 0.97202797 0.96853147 0.98601399 0.97202797
0.97560976 0.97212544 0.96853147 0.96503497]
mean value: 0.9730951974854414
key: test_roc_auc
value: [0.90625 0.84375 0.890625 0.890625 0.890625 0.875
0.85836694 0.85735887 0.85635081 0.84022177]
mean value: 0.8709173387096774
key: train_roc_auc
value: [0.95454545 0.9527972 0.9527972 0.9527972 0.96153846 0.95454545
0.95458809 0.95809069 0.95290685 0.9511586 ]
mean value: 0.9545765210399356
key: test_jcc
value: [0.83333333 0.72972973 0.8 0.80555556 0.81081081 0.77777778
0.76315789 0.75 0.76315789 0.74358974]
mean value: 0.7777112740270635
key: train_jcc
value: [0.9147541 0.91176471 0.91147541 0.91118421 0.92763158 0.91447368
0.91503268 0.92079208 0.91118421 0.90789474]
mean value: 0.9146187394078189
MCC on Blind test: 0.64
Accuracy on Blind test: 0.82
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02556682 0.01088738 0.01087403 0.01076269 0.01079369 0.01202536
0.0111661 0.01087999 0.01215863 0.01092458]
mean value: 0.01260392665863037
key: score_time
value: [0.01301289 0.00922108 0.00939584 0.00923276 0.00920796 0.00946498
0.00923038 0.00919342 0.00994778 0.00926733]
mean value: 0.009717440605163575
key: test_mcc
value: [0.31814238 0.34527065 0.44539933 0.34391797 0.42333825 0.62622429
0.36629686 0.42842742 0.33367758 0.49193548]
mean value: 0.4122630210919585
key: train_mcc
value: [0.53152212 0.51229647 0.48732947 0.48309663 0.4795166 0.51083262
0.46627502 0.46655817 0.5205467 0.5054141 ]
mean value: 0.4963387918137142
key: test_accuracy
value: [0.65625 0.671875 0.71875 0.671875 0.703125 0.8125
0.68253968 0.71428571 0.66666667 0.74603175]
mean value: 0.704389880952381
key: train_accuracy
value: [0.76398601 0.75524476 0.74300699 0.74125874 0.73951049 0.7534965
0.73298429 0.73298429 0.7591623 0.7521815 ]
mean value: 0.7473815887428453
key: test_fscore
value: [0.68571429 0.6557377 0.68965517 0.67692308 0.73972603 0.81818182
0.6875 0.70967742 0.68656716 0.75 ]
mean value: 0.709968266908221
key: train_fscore
value: [0.7768595 0.76510067 0.75210793 0.74744027 0.74529915 0.76771005
0.73846154 0.74023769 0.76923077 0.75932203]
mean value: 0.7561769601426575
key: test_precision
value: [0.63157895 0.68965517 0.76923077 0.66666667 0.65853659 0.79411765
0.66666667 0.70967742 0.65714286 0.75 ]
mean value: 0.6993272731268689
key: train_precision
value: [0.73667712 0.73548387 0.72638436 0.73 0.72909699 0.7258567
0.72483221 0.7218543 0.73717949 0.73684211]
mean value: 0.7304207151405426
key: test_recall
value: [0.75 0.625 0.625 0.6875 0.84375 0.84375
0.70967742 0.70967742 0.71875 0.75 ]
mean value: 0.7263104838709677
key: train_recall
value: [0.82167832 0.7972028 0.77972028 0.76573427 0.76223776 0.81468531
0.75261324 0.75958188 0.8041958 0.78321678]
mean value: 0.7840866450622548
key: test_roc_auc
value: [0.65625 0.671875 0.71875 0.671875 0.703125 0.8125
0.68296371 0.71421371 0.66582661 0.74596774]
mean value: 0.7043346774193548
key: train_roc_auc
value: [0.76398601 0.75524476 0.74300699 0.74125874 0.73951049 0.7534965
0.73294998 0.73293779 0.75924076 0.75223557]
mean value: 0.7473867595818815
key: test_jcc
value: [0.52173913 0.48780488 0.52631579 0.51162791 0.58695652 0.69230769
0.52380952 0.55 0.52272727 0.6 ]
mean value: 0.5523288715517611
key: train_jcc
value: [0.63513514 0.61956522 0.6027027 0.59673025 0.59400545 0.62299465
0.58536585 0.58760108 0.625 0.61202186]
mean value: 0.6081122192207598
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.30241895 0.10555077 0.10573483 0.10488105 0.10599661 0.11018038
0.10661006 0.10959363 0.10565329 0.14151597]
mean value: 0.129813551902771
key: score_time
value: [0.01123381 0.01157808 0.01123309 0.01145458 0.01132345 0.01121831
0.01126218 0.01123905 0.01134586 0.01125169]
mean value: 0.011314010620117188
key: test_mcc
value: [0.875 0.71910121 0.81409158 0.84748251 0.78163175 0.75146915
0.68865372 0.74772995 0.77822581 0.77800241]
mean value: 0.7781388085762072
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9375 0.859375 0.90625 0.921875 0.890625 0.875
0.84126984 0.87301587 0.88888889 0.88888889]
mean value: 0.8882688492063492
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.9375 0.86153846 0.90322581 0.91803279 0.89230769 0.87096774
0.84848485 0.875 0.88888889 0.89230769]
mean value: 0.8888253918799927
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9375 0.84848485 0.93333333 0.96551724 0.87878788 0.9
0.8 0.84848485 0.90322581 0.87878788]
mean value: 0.8894121835709712
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.9375 0.875 0.875 0.875 0.90625 0.84375
0.90322581 0.90322581 0.875 0.90625 ]
mean value: 0.8900201612903226
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9375 0.859375 0.90625 0.921875 0.890625 0.875
0.8422379 0.8734879 0.8891129 0.88860887]
mean value: 0.8884072580645161
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88235294 0.75675676 0.82352941 0.84848485 0.80555556 0.77142857
0.73684211 0.77777778 0.8 0.80555556]
mean value: 0.80082835237634
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.0486443 0.06402254 0.04328895 0.08116221 0.05702353 0.09169793
0.07593226 0.04668522 0.07705951 0.04529047]
mean value: 0.06308069229125976
key: score_time
value: [0.01902747 0.01237202 0.01238132 0.01268625 0.02459049 0.01931906
0.01228476 0.01229668 0.01233292 0.02081442]
mean value: 0.015810537338256835
key: test_mcc
value: [0.38177086 0.31311215 0.65657067 0.55359617 0.4163332 0.65657067
0.53874599 0.61982085 0.30272467 0.46146899]
mean value: 0.49007142138993015
key: train_mcc
value: [0.77812872 0.80152775 0.7847726 0.76315265 0.76780686 0.76042114
0.79618159 0.78218049 0.76695074 0.8069451 ]
mean value: 0.780806763235838
key: test_accuracy
value: [0.6875 0.65625 0.828125 0.765625 0.703125 0.828125
0.76190476 0.80952381 0.65079365 0.73015873]
mean value: 0.7421130952380952
key: train_accuracy
value: [0.88811189 0.90034965 0.89160839 0.88111888 0.88286713 0.87937063
0.89703316 0.89005236 0.88307155 0.90226876]
mean value: 0.8895852402396905
key: test_fscore
value: [0.71428571 0.64516129 0.83076923 0.79452055 0.73239437 0.82539683
0.7826087 0.8 0.64516129 0.74626866]
mean value: 0.7516566617607912
key: train_fscore
value: [0.89189189 0.9025641 0.89491525 0.88395904 0.88701518 0.88324873
0.90084034 0.89411765 0.88547009 0.90572391]
mean value: 0.8929746175479386
key: test_precision
value: [0.65789474 0.66666667 0.81818182 0.70731707 0.66666667 0.83870968
0.71052632 0.82758621 0.66666667 0.71428571]
mean value: 0.727450154258575
key: train_precision
value: [0.8627451 0.88294314 0.86842105 0.86333333 0.85667752 0.8557377
0.87012987 0.86363636 0.86622074 0.87337662]
mean value: 0.8663221450093648
key: test_recall
value: [0.78125 0.625 0.84375 0.90625 0.8125 0.8125
0.87096774 0.77419355 0.625 0.78125 ]
mean value: 0.783266129032258
key: train_recall
value: [0.92307692 0.92307692 0.92307692 0.90559441 0.91958042 0.91258741
0.93379791 0.92682927 0.90559441 0.94055944]
mean value: 0.9213774030847202
key: test_roc_auc
value: [0.6875 0.65625 0.828125 0.765625 0.703125 0.828125
0.76360887 0.80897177 0.65120968 0.72933468]
mean value: 0.7421875
key: train_roc_auc
value: [0.88811189 0.90034965 0.89160839 0.88111888 0.88286713 0.87937063
0.89696888 0.88998806 0.88311079 0.90233547]
mean value: 0.8895829779976121
key: test_jcc
value: [0.55555556 0.47619048 0.71052632 0.65909091 0.57777778 0.7027027
0.64285714 0.66666667 0.47619048 0.5952381 ]
mean value: 0.6062796118059276
key: train_jcc
value: [0.80487805 0.82242991 0.80981595 0.79204893 0.7969697 0.79090909
0.81957187 0.80851064 0.79447853 0.82769231]
mean value: 0.8067304962826153
MCC on Blind test: 0.57
Accuracy on Blind test: 0.79
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02168322 0.01041508 0.01006579 0.01009798 0.0101316 0.01007843
0.00995684 0.01002717 0.01006675 0.01005673]
mean value: 0.01125795841217041
key: score_time
value: [0.01007843 0.00911808 0.00869632 0.00869465 0.00870037 0.00874281
0.00876069 0.00872588 0.00868654 0.0087564 ]
mean value: 0.00889601707458496
key: test_mcc
value: [0.51639778 0.40644851 0.34527065 0.25197632 0.54443572 0.5336001
0.46743768 0.4969666 0.5892604 0.58728587]
mean value: 0.4739079632497716
key: train_mcc
value: [0.51458239 0.49592887 0.46744423 0.48759325 0.52117846 0.51787859
0.48968865 0.48599733 0.52422453 0.50849873]
mean value: 0.5013015026995373
key: test_accuracy
value: [0.75 0.703125 0.671875 0.625 0.765625 0.765625
0.73015873 0.74603175 0.79365079 0.79365079]
mean value: 0.7344742063492063
key: train_accuracy
value: [0.75524476 0.7465035 0.73251748 0.74300699 0.75874126 0.75699301
0.7434555 0.7417103 0.7609075 0.7521815 ]
mean value: 0.7491261792308913
key: test_fscore
value: [0.77777778 0.70769231 0.6557377 0.64705882 0.78873239 0.7761194
0.74626866 0.75757576 0.80597015 0.8 ]
mean value: 0.7462932974814709
key: train_fscore
value: [0.76973684 0.75953566 0.74542429 0.75294118 0.77227723 0.77100494
0.75702479 0.75496689 0.77128548 0.76644737]
mean value: 0.7620644661560988
key: test_precision
value: [0.7 0.6969697 0.68965517 0.61111111 0.71794872 0.74285714
0.69444444 0.71428571 0.77142857 0.78787879]
mean value: 0.712657935933798
key: train_precision
value: [0.72670807 0.72239748 0.71111111 0.72491909 0.73125 0.72897196
0.72012579 0.7192429 0.73801917 0.72360248]
mean value: 0.7246348060626768
key: test_recall
value: [0.875 0.71875 0.625 0.6875 0.875 0.8125
0.80645161 0.80645161 0.84375 0.8125 ]
mean value: 0.7862903225806451
key: train_recall
value: [0.81818182 0.8006993 0.78321678 0.78321678 0.81818182 0.81818182
0.79790941 0.79442509 0.80769231 0.81468531]
mean value: 0.8036390438829464
key: test_roc_auc
value: [0.75 0.703125 0.671875 0.625 0.765625 0.765625
0.73135081 0.74697581 0.79284274 0.79334677]
mean value: 0.7345766129032258
key: train_roc_auc
value: [0.75524476 0.7465035 0.73251748 0.74300699 0.75874126 0.75699301
0.7433603 0.74161814 0.76098901 0.75229039]
mean value: 0.7491264832728247
key: test_jcc
value: [0.63636364 0.54761905 0.48780488 0.47826087 0.65116279 0.63414634
0.5952381 0.6097561 0.675 0.66666667]
mean value: 0.5982018423223509
key: train_jcc
value: [0.62566845 0.61229947 0.59416446 0.60377358 0.62903226 0.62734584
0.60904255 0.60638298 0.62771739 0.62133333]
mean value: 0.6156760314698697
MCC on Blind test: 0.26
Accuracy on Blind test: 0.65
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01659179 0.02303815 0.02188993 0.0252893 0.02217102 0.01938987
0.02506042 0.02309823 0.02040172 0.02841473]
mean value: 0.022534513473510744
key: score_time
value: [0.01002479 0.01128006 0.01193643 0.01192284 0.01185226 0.01186895
0.01191616 0.01193762 0.01183701 0.01184344]
mean value: 0.011641955375671387
key: test_mcc
value: [0.50395263 0.45184806 0.60848698 0.63628476 0.43033148 0.3146266
0.65419917 0.52679717 0.49193548 0.37005896]
mean value: 0.49885212995054273
key: train_mcc
value: [0.64492467 0.70137886 0.66009632 0.68270793 0.48108781 0.36731544
0.65453838 0.6739828 0.67888209 0.53301232]
mean value: 0.6077926631426009
key: test_accuracy
value: [0.75 0.71875 0.796875 0.8125 0.65625 0.609375
0.82539683 0.76190476 0.74603175 0.66666667]
mean value: 0.734375
key: train_accuracy
value: [0.81293706 0.84965035 0.82167832 0.83566434 0.69405594 0.61888112
0.82722513 0.83420593 0.83944154 0.73472949]
mean value: 0.786846922710797
key: test_fscore
value: [0.73333333 0.67857143 0.77192982 0.79310345 0.74418605 0.71264368
0.83076923 0.76923077 0.75 0.58823529]
mean value: 0.7372003053532222
key: train_fscore
value: [0.78727634 0.84363636 0.7992126 0.81923077 0.76383266 0.72405063
0.82901554 0.84451718 0.83916084 0.65137615]
mean value: 0.7901309079655531
key: test_precision
value: [0.78571429 0.79166667 0.88 0.88461538 0.59259259 0.56363636
0.79411765 0.73529412 0.75 0.78947368]
mean value: 0.7567110742141702
key: train_precision
value: [0.9124424 0.87878788 0.91441441 0.91025641 0.62197802 0.56746032
0.82191781 0.7962963 0.83916084 0.94666667]
mean value: 0.8209381049553387
key: test_recall
value: [0.6875 0.59375 0.6875 0.71875 1. 0.96875
0.87096774 0.80645161 0.75 0.46875 ]
mean value: 0.755241935483871
key: train_recall
value: [0.69230769 0.81118881 0.70979021 0.74475524 0.98951049 1.
0.83623693 0.8989547 0.83916084 0.4965035 ]
mean value: 0.8018408420847445
key: test_roc_auc
value: [0.75 0.71875 0.796875 0.8125 0.65625 0.609375
0.82610887 0.76260081 0.74596774 0.66985887]
mean value: 0.7348286290322581
key: train_roc_auc
value: [0.81293706 0.84965035 0.82167832 0.83566434 0.69405594 0.61888112
0.82720938 0.83409274 0.83944105 0.73431447]
mean value: 0.7867924758168661
key: test_jcc
value: [0.57894737 0.51351351 0.62857143 0.65714286 0.59259259 0.55357143
0.71052632 0.625 0.6 0.41666667]
mean value: 0.5876532171269013
key: train_jcc
value: [0.64918033 0.72955975 0.66557377 0.69381107 0.61790393 0.56746032
0.7079646 0.73087819 0.72289157 0.4829932 ]
mean value: 0.656821672158094
MCC on Blind test: 0.65
Accuracy on Blind test: 0.82
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03064966 0.02935052 0.02632499 0.02779365 0.02897787 0.03325915
0.0250783 0.03014135 0.02610087 0.0317347 ]
mean value: 0.02894110679626465
key: score_time
value: [0.01195979 0.01186347 0.01194692 0.0119195 0.01188731 0.01189685
0.01187181 0.01184964 0.0119009 0.01189089]
mean value: 0.01189870834350586
key: test_mcc
value: [0.45990694 0.32897585 0.65657067 0.52915026 0.40451992 0.4330127
0.57759945 0.53874599 0.4969666 0.55909213]
mean value: 0.49845405061750964
key: train_mcc
value: [0.67615992 0.66717709 0.71805284 0.57509353 0.54781734 0.53942373
0.62689492 0.67079344 0.68718637 0.7400109 ]
mean value: 0.6448610096309454
key: test_accuracy
value: [0.703125 0.65625 0.828125 0.71875 0.6875 0.6875
0.76190476 0.76190476 0.74603175 0.77777778]
mean value: 0.7328869047619048
key: train_accuracy
value: [0.82167832 0.81993007 0.85839161 0.7534965 0.73776224 0.72727273
0.79581152 0.81849913 0.84293194 0.86736475]
mean value: 0.8043138798374401
key: test_fscore
value: [0.75949367 0.7027027 0.83076923 0.7804878 0.61538462 0.75
0.8 0.7826087 0.73333333 0.79411765]
mean value: 0.7548897700665005
key: train_fscore
value: [0.84545455 0.84226646 0.86247878 0.80056577 0.65116279 0.78512397
0.82511211 0.84337349 0.83754513 0.87458746]
mean value: 0.8167670500726047
key: test_precision
value: [0.63829787 0.61904762 0.81818182 0.64 0.8 0.625
0.68181818 0.71052632 0.78571429 0.75 ]
mean value: 0.7068586092891804
key: train_precision
value: [0.7459893 0.7493188 0.83828383 0.67220903 0.97222222 0.64772727
0.72251309 0.74270557 0.86567164 0.828125 ]
mean value: 0.778476575645141
key: test_recall
value: [0.9375 0.8125 0.84375 1. 0.5 0.9375
0.96774194 0.87096774 0.6875 0.84375 ]
mean value: 0.8401209677419355
key: train_recall
value: [0.97552448 0.96153846 0.88811189 0.98951049 0.48951049 0.9965035
0.96167247 0.97560976 0.81118881 0.92657343]
mean value: 0.8975743768426695
key: test_roc_auc
value: [0.703125 0.65625 0.828125 0.71875 0.6875 0.6875
0.76512097 0.76360887 0.74697581 0.77671371]
mean value: 0.733366935483871
key: train_roc_auc
value: [0.82167832 0.81993007 0.85839161 0.7534965 0.73776224 0.72727273
0.79552155 0.81822446 0.84287664 0.8674679 ]
mean value: 0.8042622012134207
key: test_jcc
value: [0.6122449 0.54166667 0.71052632 0.64 0.44444444 0.6
0.66666667 0.64285714 0.57894737 0.65853659]
mean value: 0.6095890088170485
key: train_jcc
value: [0.73228346 0.72751323 0.75820896 0.66745283 0.48275862 0.6462585
0.70229008 0.72916667 0.72049689 0.7771261 ]
mean value: 0.6943555338702959
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.21230507 0.20174456 0.2023685 0.20934129 0.20010328 0.20097804
0.20069718 0.20317173 0.20407915 0.20166779]
mean value: 0.203645658493042
key: score_time
value: [0.01540875 0.01660895 0.01600695 0.01670003 0.01540923 0.01591396
0.01552558 0.01626444 0.01585245 0.01715422]
mean value: 0.016084456443786622
key: test_mcc
value: [0.72192954 0.68884672 0.72192954 0.84416229 0.65915306 0.59404013
0.71471774 0.77822581 0.8415746 0.77800241]
mean value: 0.7342581846545592
key: train_mcc
value: [0.90218614 0.90604313 0.90911314 0.87489633 0.89519245 0.93027467
0.89904215 0.90959958 0.91301596 0.891802 ]
mean value: 0.9031165549141992
key: test_accuracy
value: [0.859375 0.84375 0.859375 0.921875 0.828125 0.796875
0.85714286 0.88888889 0.92063492 0.88888889]
mean value: 0.8664930555555556
key: train_accuracy
value: [0.95104895 0.9527972 0.95454545 0.93706294 0.94755245 0.96503497
0.94938918 0.95462478 0.95636998 0.94589878]
mean value: 0.9514324680555047
key: test_fscore
value: [0.86567164 0.84848485 0.85245902 0.92307692 0.8358209 0.8
0.85714286 0.88888889 0.92307692 0.89230769]
mean value: 0.8686929686685008
key: train_fscore
value: [0.95138889 0.9535284 0.95438596 0.93835616 0.94791667 0.96539792
0.95008606 0.95532646 0.95682211 0.94570928]
mean value: 0.9518917916081902
key: test_precision
value: [0.82857143 0.82352941 0.89655172 0.90909091 0.8 0.78787879
0.84375 0.875 0.90909091 0.87878788]
mean value: 0.855225104932255
key: train_precision
value: [0.94482759 0.93898305 0.95774648 0.91946309 0.94137931 0.95547945
0.93877551 0.94237288 0.94539249 0.94736842]
mean value: 0.943178826965576
key: test_recall
value: [0.90625 0.875 0.8125 0.9375 0.875 0.8125
0.87096774 0.90322581 0.9375 0.90625 ]
mean value: 0.8836693548387097
key: train_recall
value: [0.95804196 0.96853147 0.95104895 0.95804196 0.95454545 0.97552448
0.96167247 0.96864111 0.96853147 0.94405594]
mean value: 0.9608635267171852
key: test_roc_auc
value: [0.859375 0.84375 0.859375 0.921875 0.828125 0.796875
0.85735887 0.8891129 0.9203629 0.88860887]
mean value: 0.8664818548387097
key: train_roc_auc
value: [0.95104895 0.9527972 0.95454545 0.93706294 0.94755245 0.96503497
0.94936771 0.95460028 0.95639117 0.94589557]
mean value: 0.9514296678930825
key: test_jcc
value: [0.76315789 0.73684211 0.74285714 0.85714286 0.71794872 0.66666667
0.75 0.8 0.85714286 0.80555556]
mean value: 0.7697313797313797
key: train_jcc
value: [0.90728477 0.91118421 0.91275168 0.88387097 0.9009901 0.93311037
0.90491803 0.91447368 0.91721854 0.89700997]
mean value: 0.9082812318056577
MCC on Blind test: 0.61
Accuracy on Blind test: 0.81
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.0958612 0.12185383 0.12039638 0.10832477 0.08756161 0.10275364
0.10701203 0.08953285 0.11769557 0.09121203]
mean value: 0.10422039031982422
key: score_time
value: [0.02607036 0.03825665 0.0407958 0.01853371 0.02605748 0.03799462
0.01898909 0.02665234 0.0244019 0.02416778]
mean value: 0.028191971778869628
key: test_mcc
value: [0.875 0.71910121 0.8125 0.84748251 0.62622429 0.75146915
0.74596774 0.68352185 0.74722285 0.74596774]
mean value: 0.7554457343130832
key: train_mcc
value: [0.9688217 0.98257154 0.98951654 0.97911675 0.98266766 0.9860381
0.97905753 0.97229095 0.98255382 0.97908113]
mean value: 0.9801715709042963
key: test_accuracy
value: [0.9375 0.859375 0.90625 0.921875 0.8125 0.875
0.87301587 0.84126984 0.87301587 0.87301587]
mean value: 0.877281746031746
key: train_accuracy
value: [0.98426573 0.99125874 0.99475524 0.98951049 0.99125874 0.99300699
0.9895288 0.98603839 0.991274 0.9895288 ]
mean value: 0.9900425926603937
key: test_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[0.9375 0.86153846 0.90625 0.91803279 0.81818182 0.87096774
0.87096774 0.83333333 0.87878788 0.875 ]
mean value: 0.8770559762597705
key: train_fscore
value: [0.9840708 0.99121265 0.99474606 0.98943662 0.99118166 0.99298246
0.98954704 0.98591549 0.99124343 0.98947368]
mean value: 0.9899809891560609
key: test_precision
value: [0.9375 0.84848485 0.90625 0.96551724 0.79411765 0.9
0.87096774 0.86206897 0.85294118 0.875 ]
mean value: 0.8812847620846296
key: train_precision
value: [0.99641577 0.99646643 0.99649123 0.9964539 1. 0.99647887
0.98954704 0.99644128 0.99298246 0.99295775]
mean value: 0.9954234725809098
key: test_recall
value: [0.9375 0.875 0.90625 0.875 0.84375 0.84375
0.87096774 0.80645161 0.90625 0.875 ]
mean value: 0.873991935483871
key: train_recall
value: [0.97202797 0.98601399 0.99300699 0.98251748 0.98251748 0.98951049
0.98954704 0.97560976 0.98951049 0.98601399]
mean value: 0.9846275675543968
key: test_roc_auc
value: [0.9375 0.859375 0.90625 0.921875 0.8125 0.875
0.87298387 0.84072581 0.87247984 0.87298387]
mean value: 0.8771673387096774
key: train_roc_auc
value: [0.98426573 0.99125874 0.99475524 0.98951049 0.99125874 0.99300699
0.98952876 0.98605663 0.99127092 0.98952267]
mean value: 0.9900434930922736
key: test_jcc
value: [0.88235294 0.75675676 0.82857143 0.84848485 0.69230769 0.77142857
0.77142857 0.71428571 0.78378378 0.77777778]
mean value: 0.7827178086001616
key: train_jcc
value: [0.96864111 0.9825784 0.98954704 0.97909408 0.98251748 0.98606272
0.97931034 0.97222222 0.98263889 0.97916667]
mean value: 0.9801778950070581
MCC on Blind test: 0.75
Accuracy on Blind test: 0.87
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.14886832 0.21103811 0.20516992 0.16590166 0.21451283 0.21358299
0.2202301 0.22097421 0.24722981 0.2264185 ]
mean value: 0.20739264488220216
key: score_time
value: [0.01676941 0.02704263 0.02702403 0.01622701 0.03246331 0.03205514
0.03158545 0.03284216 0.0324614 0.03306985]
mean value: 0.02815403938293457
key: test_mcc
value: [0.37573457 0.3125 0.4113018 0.34527065 0.4163332 0.40644851
0.53874599 0.5026181 0.23790323 0.46014151]
mean value: 0.4006997549871454
key: train_mcc
value: [0.93788887 0.93425771 0.93485309 0.93425771 0.93152879 0.92389053
0.93163919 0.93753513 0.93437548 0.9447324 ]
mean value: 0.9344958888426667
key: test_accuracy
value: [0.6875 0.65625 0.703125 0.671875 0.703125 0.703125
0.76190476 0.74603175 0.61904762 0.73015873]
mean value: 0.6982142857142857
key: train_accuracy
value: [0.96853147 0.96678322 0.96678322 0.96678322 0.96503497 0.96153846
0.96509599 0.96858639 0.96684119 0.97207679]
mean value: 0.9668054894494685
key: test_fscore
value: [0.67741935 0.65625 0.6779661 0.68656716 0.73239437 0.70769231
0.7826087 0.76470588 0.625 0.73846154]
mean value: 0.7049065411068873
key: train_fscore
value: [0.96917808 0.96740995 0.96763203 0.96740995 0.96598639 0.96232877
0.96610169 0.96907216 0.96740995 0.97250859]
mean value: 0.9675037567685204
key: test_precision
value: [0.7 0.65625 0.74074074 0.65714286 0.66666667 0.6969697
0.71052632 0.7027027 0.625 0.72727273]
mean value: 0.6883271707284865
key: train_precision
value: [0.94966443 0.94949495 0.94352159 0.94949495 0.94039735 0.94295302
0.94059406 0.9559322 0.94949495 0.95608108]
mean value: 0.9477628587703893
key: test_recall
value: [0.65625 0.65625 0.625 0.71875 0.8125 0.71875
0.87096774 0.83870968 0.625 0.75 ]
mean value: 0.7272177419354838
key: train_recall
value: [0.98951049 0.98601399 0.99300699 0.98601399 0.99300699 0.98251748
0.99303136 0.9825784 0.98601399 0.98951049]
mean value: 0.9881204161691967
key: test_roc_auc
value: [0.6875 0.65625 0.703125 0.671875 0.703125 0.703125
0.76360887 0.74747984 0.61895161 0.72983871]
mean value: 0.6984879032258065
key: train_roc_auc
value: [0.96853147 0.96678322 0.96678322 0.96678322 0.96503497 0.96153846
0.96504715 0.96856193 0.96687459 0.97210716]
mean value: 0.9668045369264882
key: test_jcc
value: [0.51219512 0.48837209 0.51282051 0.52272727 0.57777778 0.54761905
0.64285714 0.61904762 0.45454545 0.58536585]
mean value: 0.546332789602784
key: train_jcc
value: [0.94019934 0.93687708 0.93729373 0.93687708 0.93421053 0.92739274
0.93442623 0.94 0.93687708 0.94648829]
mean value: 0.9370642083569285
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.86300206 0.8570118 0.84512067 0.83157301 0.83923888 0.83239603
0.83465123 0.83852315 0.84167743 0.8304162 ]
mean value: 0.8413610458374023
key: score_time
value: [0.00991249 0.01026297 0.00953531 0.00981331 0.0096705 0.00957942
0.00936604 0.01012897 0.00972867 0.00952649]
mean value: 0.009752416610717773
key: test_mcc
value: [0.84416229 0.78163175 0.8125 0.81409158 0.81409158 0.75
0.77822581 0.84173387 0.8415746 0.74722285]
mean value: 0.802523432013684
key: train_mcc
value: [0.95806538 0.97553044 0.9688217 0.97557815 0.96853739 0.98601399
0.96859238 0.96863997 0.96511897 0.96530633]
mean value: 0.9700204697320085
key: test_accuracy
value: [0.921875 0.890625 0.90625 0.90625 0.90625 0.875
0.88888889 0.92063492 0.92063492 0.87301587]
mean value: 0.9009424603174603
key: train_accuracy
value: [0.97902098 0.98776224 0.98426573 0.98776224 0.98426573 0.99300699
0.98429319 0.98429319 0.98254799 0.98254799]
mean value: 0.9849766289556866
key: test_fscore
value: [0.92307692 0.89230769 0.90625 0.90322581 0.90909091 0.875
0.88888889 0.92063492 0.92307692 0.87878788]
mean value: 0.9020339942315748
key: train_fscore
value: [0.97894737 0.9877836 0.9840708 0.98769772 0.98423818 0.99300699
0.98429319 0.98423818 0.98245614 0.98233216]
mean value: 0.9849064315104781
key: test_precision
value: [0.90909091 0.87878788 0.90625 0.93333333 0.88235294 0.875
0.875 0.90625 0.90909091 0.85294118]
mean value: 0.8928097147950089
key: train_precision
value: [0.98239437 0.98606272 0.99641577 0.99293286 0.98596491 0.99300699
0.98601399 0.98943662 0.98591549 0.99285714]
mean value: 0.989100086360223
key: test_recall
value: [0.9375 0.90625 0.90625 0.875 0.9375 0.875
0.90322581 0.93548387 0.9375 0.90625 ]
mean value: 0.9119959677419355
key: train_recall
value: [0.97552448 0.98951049 0.97202797 0.98251748 0.98251748 0.99300699
0.9825784 0.97909408 0.97902098 0.97202797]
mean value: 0.9807826320021442
key: test_roc_auc
value: [0.921875 0.890625 0.90625 0.90625 0.90625 0.875
0.8891129 0.92086694 0.9203629 0.87247984]
mean value: 0.9009072580645161
key: train_roc_auc
value: [0.97902098 0.98776224 0.98426573 0.98776224 0.98426573 0.99300699
0.98429619 0.98430228 0.98254185 0.98252967]
mean value: 0.9849753904631954
key: test_jcc
value: [0.85714286 0.80555556 0.82857143 0.82352941 0.83333333 0.77777778
0.8 0.85294118 0.85714286 0.78378378]
mean value: 0.8219778181542887
key: train_jcc
value: [0.95876289 0.97586207 0.96864111 0.97569444 0.96896552 0.98611111
0.96907216 0.96896552 0.96551724 0.96527778]
mean value: 0.970286984468989
MCC on Blind test: 0.7
Accuracy on Blind test: 0.85
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03439713 0.03367257 0.0339005 0.03395748 0.03364205 0.03365755
0.03340864 0.04170251 0.04242253 0.05480218]
mean value: 0.03755631446838379
key: score_time
value: [0.01224303 0.01276779 0.01289511 0.01367021 0.0135715 0.01362062
0.01876259 0.01285124 0.01362205 0.02431679]
mean value: 0.014832091331481934
key: test_mcc
value: [ 0. 0.05006262 0.18898224 0.17466675 -0.12598816 0.05006262
0.12961896 0.04490133 -0.12607181 0.06339049]
mean value: 0.044962502731329346
key: train_mcc
value: [0.27050089 0.23526698 0.26676028 0.23937689 0.26298076 0.25139019
0.23129033 0.24763842 0.27712658 0.24677557]
mean value: 0.2529106884487375
key: test_accuracy
value: [0.5 0.515625 0.5625 0.546875 0.484375 0.515625
0.53968254 0.50793651 0.47619048 0.52380952]
mean value: 0.5172619047619047
key: train_accuracy
value: [0.56818182 0.55244755 0.56643357 0.5541958 0.56468531 0.55944056
0.55148342 0.55846422 0.57068063 0.55671902]
mean value: 0.5602731910323533
key: test_fscore
value: [0.61904762 0.65168539 0.68181818 0.68131868 0.65263158 0.65168539
0.65882353 0.64367816 0.63736264 0.66666667]
mean value: 0.6544717842009313
key: train_fscore
value: [0.6984127 0.69082126 0.69756098 0.69165659 0.69671133 0.69417476
0.69073406 0.69407497 0.6992665 0.69249395]
mean value: 0.6945907080600471
key: test_precision
value: [0.5 0.50877193 0.53571429 0.52542373 0.49206349 0.50877193
0.51851852 0.5 0.49152542 0.51724138]
mean value: 0.5098030687798137
key: train_precision
value: [0.53658537 0.52767528 0.53558052 0.52865065 0.53457944 0.53159851
0.52757353 0.53148148 0.53759398 0.52962963]
mean value: 0.5320948391649859
key: test_recall
value: [0.8125 0.90625 0.9375 0.96875 0.96875 0.90625
0.90322581 0.90322581 0.90625 0.9375 ]
mean value: 0.9150201612903226
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.5 0.515625 0.5625 0.546875 0.484375 0.515625
0.5453629 0.5141129 0.46925403 0.5171371 ]
mean value: 0.5170866935483871
key: train_roc_auc
value: [0.56818182 0.55244755 0.56643357 0.5541958 0.56468531 0.55944056
0.5506993 0.55769231 0.57142857 0.55749129]
mean value: 0.5602696084403401
key: test_jcc
value: [0.44827586 0.48333333 0.51724138 0.51666667 0.484375 0.48333333
0.49122807 0.47457627 0.46774194 0.5 ]
mean value: 0.4866771851558394
key: train_jcc
value: [0.53658537 0.52767528 0.53558052 0.52865065 0.53457944 0.53159851
0.52757353 0.53148148 0.53759398 0.52962963]
mean value: 0.5320948391649859
MCC on Blind test: 0.06
Accuracy on Blind test: 0.45
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.03693748 0.03959489 0.03556442 0.02844191 0.04038453 0.0297482
0.04064775 0.04152107 0.0279665 0.03697538]
mean value: 0.03577821254730225
key: score_time
value: [0.01513863 0.02490067 0.02957106 0.0205822 0.02573943 0.01919627
0.01912403 0.01920748 0.0150106 0.01629949]
mean value: 0.020476984977722167
key: test_mcc
value: [0.42333825 0.53150959 0.65657067 0.69991324 0.60848698 0.62622429
0.63159952 0.5892604 0.46014151 0.52371369]
mean value: 0.5750758136813815
key: train_mcc
value: [0.73686479 0.754191 0.73078337 0.72905754 0.73833893 0.74434091
0.7571405 0.74253082 0.72556417 0.75465049]
mean value: 0.7413462525930328
key: test_accuracy
value: [0.703125 0.765625 0.828125 0.84375 0.796875 0.8125
0.80952381 0.79365079 0.73015873 0.76190476]
mean value: 0.7845238095238095
key: train_accuracy
value: [0.86713287 0.87587413 0.86363636 0.86363636 0.86713287 0.87062937
0.87783595 0.86910995 0.86212914 0.87609075]
mean value: 0.8693207752108275
key: test_fscore
value: [0.73972603 0.76190476 0.83076923 0.85714286 0.81690141 0.81818182
0.82352941 0.77966102 0.73846154 0.76923077]
mean value: 0.7935508840252798
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.87248322 0.88067227 0.87 0.86824324 0.87375415 0.87625418
0.88175676 0.87603306 0.86587436 0.88067227]
mean value: 0.8745743513896477
key: test_precision
value: [0.65853659 0.77419355 0.81818182 0.78947368 0.74358974 0.79411765
0.75675676 0.82142857 0.72727273 0.75757576]
mean value: 0.7641126839827675
key: train_precision
value: [0.83870968 0.84789644 0.83121019 0.83986928 0.83227848 0.83974359
0.8557377 0.83333333 0.84158416 0.84789644]
mean value: 0.8408259297230265
key: test_recall
value: [0.84375 0.75 0.84375 0.9375 0.90625 0.84375
0.90322581 0.74193548 0.75 0.78125 ]
mean value: 0.830141129032258
key: train_recall
value: [0.90909091 0.91608392 0.91258741 0.8986014 0.91958042 0.91608392
0.90940767 0.92334495 0.89160839 0.91608392]
mean value: 0.9112472892960698
key: test_roc_auc
value: [0.703125 0.765625 0.828125 0.84375 0.796875 0.8125
0.8109879 0.79284274 0.72983871 0.76159274]
mean value: 0.7845262096774194
key: train_roc_auc
value: [0.86713287 0.87587413 0.86363636 0.86363636 0.86713287 0.87062937
0.87778076 0.86901513 0.8621805 0.87616042]
mean value: 0.8693178772447065
key: test_jcc
value: [0.58695652 0.61538462 0.71052632 0.75 0.69047619 0.69230769
0.7 0.63888889 0.58536585 0.625 ]
mean value: 0.6594906078244528
key: train_jcc
value: [0.77380952 0.78678679 0.7699115 0.76716418 0.77581121 0.7797619
0.78851964 0.77941176 0.76347305 0.78678679]
mean value: 0.777143635117412
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.26221371 0.30311131 0.30173469 0.37015843 0.35645747 0.3390913
0.31258631 0.33007884 0.3021841 0.35456944]
mean value: 0.323218560218811
key: score_time
value: [0.0190413 0.0189321 0.0212183 0.02102017 0.02358246 0.01976538
0.01957774 0.04501891 0.02539778 0.02190781]
mean value: 0.0235461950302124
key: test_mcc
value: [0.42333825 0.53150959 0.65657067 0.69991324 0.60848698 0.62622429
0.63159952 0.5892604 0.46014151 0.52371369]
mean value: 0.5750758136813815
key: train_mcc
value: [0.73686479 0.754191 0.73078337 0.72905754 0.73833893 0.74434091
0.7571405 0.74253082 0.72556417 0.75465049]
mean value: 0.7413462525930328
key: test_accuracy
value: [0.703125 0.765625 0.828125 0.84375 0.796875 0.8125
0.80952381 0.79365079 0.73015873 0.76190476]
mean value: 0.7845238095238095
key: train_accuracy
value: [0.86713287 0.87587413 0.86363636 0.86363636 0.86713287 0.87062937
0.87783595 0.86910995 0.86212914 0.87609075]
mean value: 0.8693207752108275
key: test_fscore
value: [0.73972603 0.76190476 0.83076923 0.85714286 0.81690141 0.81818182
0.82352941 0.77966102 0.73846154 0.76923077]
mean value: 0.7935508840252798
key: train_fscore
value: [0.87248322 0.88067227 0.87 0.86824324 0.87375415 0.87625418
0.88175676 0.87603306 0.86587436 0.88067227]
mean value: 0.8745743513896477
key: test_precision
value: [0.65853659 0.77419355 0.81818182 0.78947368 0.74358974 0.79411765
0.75675676 0.82142857 0.72727273 0.75757576]
mean value: 0.7641126839827675
key: train_precision
value: [0.83870968 0.84789644 0.83121019 0.83986928 0.83227848 0.83974359
0.8557377 0.83333333 0.84158416 0.84789644]
mean value: 0.8408259297230265
key: test_recall
value: [0.84375 0.75 0.84375 0.9375 0.90625 0.84375
0.90322581 0.74193548 0.75 0.78125 ]
mean value: 0.830141129032258
key: train_recall
value: [0.90909091 0.91608392 0.91258741 0.8986014 0.91958042 0.91608392
0.90940767 0.92334495 0.89160839 0.91608392]
mean value: 0.9112472892960698
key: test_roc_auc
value: [0.703125 0.765625 0.828125 0.84375 0.796875 0.8125
0.8109879 0.79284274 0.72983871 0.76159274]
mean value: 0.7845262096774194
key: train_roc_auc
value: [0.86713287 0.87587413 0.86363636 0.86363636 0.86713287 0.87062937
0.87778076 0.86901513 0.8621805 0.87616042]
mean value: 0.8693178772447065
key: test_jcc
value: [0.58695652 0.61538462 0.71052632 0.75 0.69047619 0.69230769
0.7 0.63888889 0.58536585 0.625 ]
mean value: 0.6594906078244528
key: train_jcc
value: [0.77380952 0.78678679 0.7699115 0.76716418 0.77581121 0.7797619
0.78851964 0.77941176 0.76347305 0.78678679]
mean value: 0.777143635117412
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04019094 0.0413723 0.04116201 0.04376554 0.04884338 0.04856467
0.04278922 0.04961562 0.04909396 0.04108286]
mean value: 0.044648051261901855
key: score_time
value: [0.01230788 0.01224995 0.01228285 0.0129385 0.01298714 0.01302695
0.02017069 0.01323438 0.01313925 0.02139902]
mean value: 0.01437366008758545
key: test_mcc
value: [0.6882472 0.4459799 0.50051733 0.56950711 0.63213531 0.54098368
0.61371748 0.61028941 0.58615222 0.52312769]
mean value: 0.571065733859239
key: train_mcc
value: [0.69132428 0.68622226 0.68555338 0.67500864 0.65578747 0.65626006
0.66354171 0.67035181 0.65177608 0.69416521]
mean value: 0.672999088350579
key: test_accuracy
value: [0.84090909 0.71590909 0.75 0.78409091 0.81609195 0.77011494
0.8045977 0.8045977 0.79310345 0.75862069]
mean value: 0.78380355276907
key: train_accuracy
value: [0.84478372 0.84223919 0.84223919 0.83715013 0.82719187 0.82719187
0.83100381 0.83481576 0.82465057 0.84625159]
mean value: 0.8357517677526989
key: test_fscore
value: [0.85106383 0.74747475 0.74418605 0.79120879 0.81395349 0.77272727
0.81318681 0.81318681 0.79545455 0.77894737]
mean value: 0.7921389716330991
key: train_fscore
value: [0.85012285 0.84766585 0.84653465 0.84079602 0.83292383 0.83374083
0.83680982 0.83830846 0.83170732 0.85116851]
mean value: 0.8409778137794869
key: test_precision
value: [0.8 0.67272727 0.76190476 0.76595745 0.81395349 0.75555556
0.77083333 0.78723404 0.79545455 0.7254902 ]
mean value: 0.7649110642787695
key: train_precision
value: [0.82185273 0.81947743 0.82409639 0.82238443 0.80714286 0.80424528
0.80997625 0.81995134 0.79859485 0.82380952]
mean value: 0.8151531077013614
key: test_recall
value: [0.90909091 0.84090909 0.72727273 0.81818182 0.81395349 0.79069767
0.86046512 0.84090909 0.79545455 0.84090909]
mean value: 0.823784355179704
key: train_recall
value: [0.88040712 0.8778626 0.87022901 0.86005089 0.86040609 0.86548223
0.86548223 0.85750636 0.86768448 0.88040712]
mean value: 0.8685518141072835
key: test_roc_auc
value: [0.84090909 0.71590909 0.75 0.78409091 0.81606765 0.77034884
0.80523256 0.80417548 0.79307611 0.75766385]
mean value: 0.7837473572938689
key: train_roc_auc
value: [0.84478372 0.84223919 0.84223919 0.83715013 0.82714961 0.82714315
0.83095995 0.83484455 0.82470518 0.84629493]
mean value: 0.8357509590421204
key: test_jcc
value: [0.74074074 0.59677419 0.59259259 0.65454545 0.68627451 0.62962963
0.68518519 0.68518519 0.66037736 0.63793103]
mean value: 0.6569235884204421
key: train_jcc
value: [0.73931624 0.73560768 0.73390558 0.72532189 0.71368421 0.7148847
0.71940928 0.72162741 0.71189979 0.74089936]
mean value: 0.7256556130104113
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [1.02642417 0.96141696 0.96252251 1.05650187 0.9084363 1.06816649
0.99326634 1.05088854 0.93737698 1.2540741 ]
mean value: 1.021907424926758
key: score_time
value: [0.01944494 0.0148797 0.01482654 0.01350117 0.01510096 0.01505184
0.01477361 0.01542497 0.01489115 0.01478815]
mean value: 0.015268301963806153
key: test_mcc
value: [0.6846532 0.51343603 0.52286233 0.68252363 0.65696218 0.70301836
0.78388673 0.70121639 0.67811839 0.56634733]
mean value: 0.6493024567830749
key: train_mcc
value: [0.77935545 0.79259146 0.80975034 0.80522165 0.787535 0.76785901
0.77980298 0.78559375 0.788271 0.81000038]
mean value: 0.7905981006091508
key: test_accuracy
value: [0.84090909 0.75 0.76136364 0.84090909 0.82758621 0.85057471
0.88505747 0.85057471 0.83908046 0.7816092 ]
mean value: 0.8227664576802508
key: train_accuracy
value: [0.88931298 0.8956743 0.90458015 0.90203562 0.89326557 0.88310038
0.88945362 0.89199492 0.89326557 0.9047014 ]
mean value: 0.894738450197387
key: test_fscore
value: [0.84782609 0.7755102 0.75862069 0.84444444 0.83146067 0.85393258
0.89361702 0.85393258 0.84090909 0.79569892]
mean value: 0.8295952304751271
key: train_fscore
value: [0.89165629 0.89851485 0.90636704 0.90458488 0.8960396 0.88697789
0.89219331 0.89519112 0.89655172 0.90636704]
mean value: 0.8974443750776682
key: test_precision
value: [0.8125 0.7037037 0.76744186 0.82608696 0.80434783 0.82608696
0.82352941 0.84444444 0.84090909 0.75510204]
mean value: 0.8004152291233823
key: train_precision
value: [0.87317073 0.8746988 0.88970588 0.88164251 0.87439614 0.85952381
0.8716707 0.86842105 0.86873508 0.88970588]
mean value: 0.8751670586803703
key: test_recall
value: [0.88636364 0.86363636 0.75 0.86363636 0.86046512 0.88372093
0.97674419 0.86363636 0.84090909 0.84090909]
mean value: 0.8630021141649049
key: train_recall
value: [0.91094148 0.92366412 0.92366412 0.92875318 0.91878173 0.91624365
0.91370558 0.92366412 0.92620865 0.92366412]
mean value: 0.9209290760904664
key: test_roc_auc
value: [0.84090909 0.75 0.76136364 0.84090909 0.82795983 0.85095137
0.88609937 0.85042283 0.8390592 0.78091966]
mean value: 0.8228594080338266
key: train_roc_auc
value: [0.88931298 0.8956743 0.90458015 0.90203562 0.8932331 0.88305821
0.88942277 0.89203511 0.89330737 0.90472546]
mean value: 0.894738507640046
key: test_jcc
value: [0.73584906 0.63333333 0.61111111 0.73076923 0.71153846 0.74509804
0.80769231 0.74509804 0.7254902 0.66071429]
mean value: 0.7106694061272307
key: train_jcc
value: [0.80449438 0.81573034 0.82876712 0.82579186 0.81165919 0.79690949
0.80536913 0.81026786 0.8125 0.82876712]
mean value: 0.8140256490638564
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01627254 0.0121696 0.0112617 0.01123834 0.01204562 0.01295519
0.0131247 0.01328087 0.01170492 0.01157737]
mean value: 0.012563085556030274
key: score_time
value: [0.01252651 0.00963211 0.00918579 0.00921011 0.01000023 0.01023221
0.01029634 0.01028538 0.00933933 0.00943708]
mean value: 0.010014510154724121
key: test_mcc
value: [0.34869484 0.32357511 0.47739604 0.50847518 0.41659257 0.54016913
0.35695404 0.37916452 0.33315711 0.24154334]
mean value: 0.39257218826227025
key: train_mcc
value: [0.39046926 0.48153999 0.43592814 0.43606986 0.4002895 0.45727191
0.45958799 0.46528515 0.44934693 0.45720637]
mean value: 0.4432995086452203
key: test_accuracy
value: [0.65909091 0.65909091 0.73863636 0.75 0.70114943 0.77011494
0.67816092 0.68965517 0.66666667 0.62068966]
mean value: 0.6933254963427378
key: train_accuracy
value: [0.68575064 0.74045802 0.71755725 0.71755725 0.69758577 0.72808132
0.72935197 0.73189327 0.72426938 0.72808132]
mean value: 0.7200586179358598
key: test_fscore
value: [0.71698113 0.6875 0.74157303 0.77083333 0.64864865 0.76744186
0.68181818 0.69662921 0.6741573 0.62068966]
mean value: 0.7006272362074963
key: train_fscore
value: [0.72767365 0.74689826 0.72592593 0.72660099 0.67217631 0.7377451
0.73800738 0.74173807 0.73176761 0.7364532 ]
mean value: 0.7284986492626067
key: test_precision
value: [0.61290323 0.63461538 0.73333333 0.71153846 0.77419355 0.76744186
0.66666667 0.68888889 0.66666667 0.62790698]
mean value: 0.6884155013112252
key: train_precision
value: [0.64202335 0.72881356 0.70503597 0.70405728 0.73493976 0.71327014
0.71599045 0.71462264 0.71153846 0.71360382]
mean value: 0.7083895432425341
key: test_recall
value: [0.86363636 0.75 0.75 0.84090909 0.55813953 0.76744186
0.69767442 0.70454545 0.68181818 0.61363636]
mean value: 0.7227801268498943
key: train_recall
value: [0.83969466 0.76590331 0.7480916 0.75063613 0.61928934 0.76395939
0.76142132 0.77099237 0.75318066 0.76081425]
mean value: 0.7533983027860658
key: test_roc_auc
value: [0.65909091 0.65909091 0.73863636 0.75 0.69952431 0.77008457
0.67838266 0.68948203 0.66649049 0.62077167]
mean value: 0.6931553911205074
key: train_roc_auc
value: [0.68575064 0.74045802 0.71755725 0.71755725 0.69768538 0.72803568
0.72931117 0.73194288 0.72430607 0.72812286]
mean value: 0.7200727192880485
key: test_jcc
value: [0.55882353 0.52380952 0.58928571 0.62711864 0.48 0.62264151
0.51724138 0.53448276 0.50847458 0.45 ]
mean value: 0.5411877635210982
key: train_jcc
value: [0.57192374 0.5960396 0.56976744 0.57059961 0.50622407 0.58446602
0.58479532 0.58949416 0.57699805 0.582846 ]
mean value: 0.5733154027924497
MCC on Blind test: 0.54
Accuracy on Blind test: 0.77
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01238036 0.01613355 0.01603031 0.01599598 0.01595688 0.01601386
0.01603103 0.01612544 0.01610541 0.01639628]
mean value: 0.015716910362243652
key: score_time
value: [0.01213884 0.01235485 0.0123291 0.01238704 0.01234341 0.01234961
0.01233554 0.01234794 0.01232505 0.012326 ]
mean value: 0.012323737144470215
key: test_mcc
value: [0.43463356 0.21410373 0.43463356 0.48342972 0.51718675 0.4070455
0.40221987 0.47273749 0.37964137 0.26461585]
mean value: 0.4010247399358443
key: train_mcc
value: [0.47703926 0.49682698 0.44056884 0.47587923 0.46007909 0.46468129
0.45642617 0.47568224 0.45667135 0.49463369]
mean value: 0.4698488137783724
key: test_accuracy
value: [0.71590909 0.60227273 0.71590909 0.73863636 0.75862069 0.70114943
0.70114943 0.73563218 0.68965517 0.63218391]
mean value: 0.6991118077324974
key: train_accuracy
value: [0.73791349 0.7480916 0.72010178 0.73664122 0.72935197 0.73189327
0.72808132 0.73697586 0.72808132 0.74587039]
mean value: 0.7343002221209153
key: test_fscore
value: [0.7311828 0.65346535 0.69879518 0.75789474 0.75294118 0.7173913
0.69767442 0.72941176 0.7032967 0.65217391]
mean value: 0.7094227340267705
key: train_fscore
value: [0.74692875 0.75434243 0.72568579 0.7496977 0.73992674 0.7404674
0.73316708 0.74725275 0.73383085 0.75845411]
mean value: 0.7429753592965128
key: test_precision
value: [0.69387755 0.57894737 0.74358974 0.70588235 0.76190476 0.67346939
0.69767442 0.75609756 0.68085106 0.625 ]
mean value: 0.6917294209042293
key: train_precision
value: [0.72209026 0.73607748 0.71149144 0.71428571 0.71294118 0.71837709
0.72058824 0.71830986 0.71776156 0.72183908]
mean value: 0.7193761896813866
key: test_recall
value: [0.77272727 0.75 0.65909091 0.81818182 0.74418605 0.76744186
0.69767442 0.70454545 0.72727273 0.68181818]
mean value: 0.732293868921776
key: train_recall
value: [0.7735369 0.7735369 0.74045802 0.78880407 0.76903553 0.76395939
0.74619289 0.77862595 0.75063613 0.79898219]
mean value: 0.7683767969930639
key: test_roc_auc
value: [0.71590909 0.60227273 0.71590909 0.73863636 0.75845666 0.70190275
0.70110994 0.73599366 0.68921776 0.63160677]
mean value: 0.6991014799154334
key: train_roc_auc
value: [0.73791349 0.7480916 0.72010178 0.73664122 0.72930148 0.73185247
0.72805828 0.73702871 0.72810994 0.74593779]
mean value: 0.7343036772968574
key: test_jcc
value: [0.57627119 0.48529412 0.53703704 0.61016949 0.60377358 0.55932203
0.53571429 0.57407407 0.54237288 0.48387097]
mean value: 0.5507899660340391
key: train_jcc
value: [0.59607843 0.60557769 0.56947162 0.59961315 0.5872093 0.58789062
0.57874016 0.59649123 0.57956778 0.61089494]
mean value: 0.5911534932157384
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01603746 0.01224732 0.01173258 0.01174664 0.01122117 0.01208568
0.01260567 0.01258731 0.01110101 0.01235771]
mean value: 0.012372255325317383
key: score_time
value: [0.04256272 0.02004099 0.01386952 0.01411438 0.01441169 0.01482582
0.01483512 0.01415348 0.01409531 0.01477814]
mean value: 0.017768716812133788
key: test_mcc
value: [0.34530694 0.37340802 0.18257419 0.41294832 0.51879367 0.35695404
0.33350951 0.38257713 0.28738215 0.40515647]
mean value: 0.3598610447642626
key: train_mcc
value: [0.58731594 0.621592 0.59890351 0.58731594 0.60505532 0.60261756
0.62543924 0.58154252 0.58881754 0.59249991]
mean value: 0.5991099467682587
key: test_accuracy
value: [0.67045455 0.68181818 0.59090909 0.70454545 0.75862069 0.67816092
0.66666667 0.68965517 0.64367816 0.70114943]
mean value: 0.6785658307210032
key: train_accuracy
value: [0.79262087 0.81043257 0.79898219 0.79262087 0.80177891 0.80050826
0.81194409 0.79034307 0.79415502 0.79542567]
mean value: 0.7988811507609339
key: test_fscore
value: [0.69473684 0.71428571 0.57142857 0.72340426 0.76404494 0.68181818
0.66666667 0.6746988 0.65934066 0.72340426]
mean value: 0.6873828885284302
key: train_fscore
value: [0.8009768 0.81490683 0.80445545 0.8009768 0.80882353 0.80783354
0.81862745 0.79553903 0.79800499 0.80245399]
mean value: 0.8052598406238634
key: test_precision
value: [0.64705882 0.64814815 0.6 0.68 0.73913043 0.66666667
0.65909091 0.71794872 0.63829787 0.68 ]
mean value: 0.6676341572506888
key: train_precision
value: [0.76995305 0.7961165 0.78313253 0.76995305 0.78199052 0.78014184
0.79146919 0.77536232 0.78239609 0.77488152]
mean value: 0.7805396621320495
key: test_recall
value: [0.75 0.79545455 0.54545455 0.77272727 0.79069767 0.69767442
0.6744186 0.63636364 0.68181818 0.77272727]
mean value: 0.7117336152219873
key: train_recall
value: [0.8346056 0.8346056 0.82697201 0.8346056 0.83756345 0.83756345
0.84771574 0.81679389 0.81424936 0.83206107]
mean value: 0.8316735769364901
key: test_roc_auc
value: [0.67045455 0.68181818 0.59090909 0.70454545 0.7589852 0.67838266
0.66675476 0.69027484 0.64323467 0.70031712]
mean value: 0.6785676532769556
key: train_roc_auc
value: [0.79262087 0.81043257 0.79898219 0.79262087 0.80173338 0.80046112
0.81189858 0.79037664 0.79418052 0.79547216]
mean value: 0.7988778884282042
key: test_jcc
value: [0.53225806 0.55555556 0.4 0.56666667 0.61818182 0.51724138
0.5 0.50909091 0.49180328 0.56666667]
mean value: 0.5257464338676614
key: train_jcc
value: [0.66802444 0.68763103 0.67287785 0.66802444 0.67901235 0.67761807
0.69294606 0.66049383 0.66390041 0.67008197]
mean value: 0.6740610436778488
MCC on Blind test: 0.29
Accuracy on Blind test: 0.66
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.05328178 0.05062747 0.04658628 0.0554626 0.05192924 0.04524851
0.04780626 0.04653692 0.04967833 0.05186844]
mean value: 0.0499025821685791
key: score_time
value: [0.01994634 0.01860738 0.01801848 0.02021909 0.01826525 0.01802874
0.01815724 0.01966429 0.0205009 0.01840186]
mean value: 0.018980956077575682
key: test_mcc
value: [0.66143783 0.38357064 0.48342972 0.53987041 0.65539112 0.57138821
0.55620192 0.58821234 0.62173301 0.39528559]
mean value: 0.5456520787005305
key: train_mcc
value: [0.63985567 0.65908472 0.67484545 0.65836096 0.63563364 0.66781139
0.64200177 0.6513967 0.63924521 0.65891801]
mean value: 0.6527153529678907
key: test_accuracy
value: [0.81818182 0.68181818 0.73863636 0.76136364 0.82758621 0.7816092
0.77011494 0.79310345 0.8045977 0.68965517]
mean value: 0.7666666666666667
key: train_accuracy
value: [0.81552163 0.82315522 0.83333333 0.82569975 0.81321474 0.82846252
0.81702668 0.82210928 0.81448539 0.82337992]
mean value: 0.8216388449712406
key: test_fscore
value: [0.84 0.7254902 0.75789474 0.78787879 0.82758621 0.79569892
0.79166667 0.80434783 0.82474227 0.73267327]
mean value: 0.7887978880548652
key: train_fscore
value: [0.82961222 0.83893395 0.84533648 0.83748517 0.82807018 0.84284051
0.83058824 0.83412322 0.82943925 0.83855981]
mean value: 0.8354989038165057
key: test_precision
value: [0.75 0.63793103 0.70588235 0.70909091 0.81818182 0.74
0.71698113 0.77083333 0.75471698 0.64912281]
mean value: 0.7252740368255087
key: train_precision
value: [0.77074236 0.77021277 0.78854626 0.78444444 0.76789588 0.77849462
0.77412281 0.7804878 0.76673866 0.77136752]
mean value: 0.7753053120338202
key: test_recall
value: [0.95454545 0.84090909 0.81818182 0.88636364 0.8372093 0.86046512
0.88372093 0.84090909 0.90909091 0.84090909]
mean value: 0.86723044397463
key: train_recall
value: [0.89821883 0.92111959 0.91094148 0.89821883 0.89847716 0.91878173
0.89593909 0.8956743 0.90330789 0.91857506]
mean value: 0.9059253949186913
key: test_roc_auc
value: [0.81818182 0.68181818 0.73863636 0.76136364 0.82769556 0.78250529
0.77140592 0.79254757 0.80338266 0.68789641]
mean value: 0.7665433403805497
key: train_roc_auc
value: [0.81552163 0.82315522 0.83333333 0.82569975 0.81310626 0.82834761
0.81692629 0.82220263 0.81459811 0.82350073]
mean value: 0.8216391547512949
key: test_jcc
value: [0.72413793 0.56923077 0.61016949 0.65 0.70588235 0.66071429
0.65517241 0.67272727 0.70175439 0.578125 ]
mean value: 0.6527913902931426
key: train_jcc
value: [0.70883534 0.72255489 0.73210634 0.72040816 0.70658683 0.72837022
0.71026157 0.71544715 0.70858283 0.722 ]
mean value: 0.7175153340213286
MCC on Blind test: 0.37
Accuracy on Blind test: 0.69
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [2.5462265 1.42994976 2.20784569 1.73624086 1.58668661 2.60854244
2.25105 2.60767412 2.84883237 1.98630857]
mean value: 2.1809356927871706
key: score_time
value: [0.01277137 0.01260996 0.0125699 0.01260591 0.01268101 0.01257396
0.01291466 0.01257181 0.01335096 0.01339889]
mean value: 0.012804841995239258
key: test_mcc
value: [0.63900965 0.51745489 0.45883147 0.54772256 0.63444041 0.70301836
0.67803941 0.75240169 0.67866682 0.56484984]
mean value: 0.6174435091168581
key: train_mcc
value: [0.92880129 0.80296278 0.89632742 0.8778626 0.83488556 0.90993013
0.89733331 0.92697253 0.94487561 0.90880964]
mean value: 0.892876085727712
key: test_accuracy
value: [0.81818182 0.73863636 0.72727273 0.77272727 0.81609195 0.85057471
0.82758621 0.87356322 0.83908046 0.7816092 ]
mean value: 0.8045323928944619
key: train_accuracy
value: [0.96437659 0.89821883 0.94783715 0.9389313 0.91740788 0.95425667
0.94790343 0.96315121 0.97204574 0.95425667]
mean value: 0.9458385468700997
key: test_fscore
value: [0.82608696 0.78095238 0.70731707 0.7826087 0.80487805 0.85393258
0.84536082 0.86746988 0.84444444 0.77647059]
mean value: 0.8089521476287256
key: train_fscore
value: [0.96455696 0.90430622 0.94682231 0.9389313 0.91698595 0.95555556
0.94944513 0.96238651 0.97256858 0.95477387]
mean value: 0.9466332383939996
key: test_precision
value: [0.79166667 0.67213115 0.76315789 0.75 0.84615385 0.82608696
0.75925926 0.92307692 0.82608696 0.80487805]
mean value: 0.7962497699258487
key: train_precision
value: [0.95969773 0.85327314 0.96560847 0.9389313 0.92287918 0.93028846
0.92326139 0.98148148 0.95354523 0.94292804]
mean value: 0.9371894417274584
key: test_recall
value: [0.86363636 0.93181818 0.65909091 0.81818182 0.76744186 0.88372093
0.95348837 0.81818182 0.86363636 0.75 ]
mean value: 0.8309196617336152
key: train_recall
value: [0.96946565 0.96183206 0.92875318 0.9389313 0.91116751 0.9822335
0.97715736 0.94402036 0.99236641 0.96692112]
mean value: 0.9572848451970396
key: test_roc_auc
value: [0.81818182 0.73863636 0.72727273 0.77272727 0.81553911 0.85095137
0.82901691 0.87420719 0.83879493 0.78197674]
mean value: 0.80473044397463
key: train_roc_auc
value: [0.96437659 0.89821883 0.94783715 0.9389313 0.91741582 0.95422108
0.94786621 0.96312693 0.97207153 0.95427274]
mean value: 0.9458338176980405
key: test_jcc
value: [0.7037037 0.640625 0.54716981 0.64285714 0.67346939 0.74509804
0.73214286 0.76595745 0.73076923 0.63461538]
mean value: 0.6816408004188372
key: train_jcc
value: [0.93154034 0.82532751 0.89901478 0.88489209 0.84669811 0.91489362
0.90375587 0.9275 0.94660194 0.91346154]
mean value: 0.8993685796853914
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.06037998 0.04592085 0.0504756 0.05603194 0.04493618 0.04983497
0.04593444 0.04674339 0.04872298 0.04878926]
mean value: 0.04977695941925049
key: score_time
value: [0.00964427 0.00910902 0.00899935 0.00910282 0.00918436 0.00902581
0.00907683 0.00908375 0.00915027 0.0095067 ]
mean value: 0.009188318252563476
key: test_mcc
value: [0.84287052 0.70618882 0.75488987 0.6846532 0.77102073 0.84118687
0.77102073 0.79334038 0.77359882 0.67811839]
mean value: 0.7616888342891726
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92045455 0.85227273 0.875 0.84090909 0.88505747 0.91954023
0.88505747 0.89655172 0.88505747 0.83908046]
mean value: 0.879898119122257
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91764706 0.85714286 0.88172043 0.84782609 0.88636364 0.92134831
0.88636364 0.89655172 0.88095238 0.84090909]
mean value: 0.8816825216363853
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95121951 0.82978723 0.83673469 0.8125 0.86666667 0.89130435
0.86666667 0.90697674 0.925 0.84090909]
mean value: 0.8727764956369783
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88636364 0.88636364 0.93181818 0.88636364 0.90697674 0.95348837
0.90697674 0.88636364 0.84090909 0.84090909]
mean value: 0.8926532769556025
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92045455 0.85227273 0.875 0.84090909 0.88530655 0.919926
0.88530655 0.89667019 0.88557082 0.8390592 ]
mean value: 0.8800475687103593
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84782609 0.75 0.78846154 0.73584906 0.79591837 0.85416667
0.79591837 0.8125 0.78723404 0.7254902 ]
mean value: 0.7893364322014
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.17561674 0.17232823 0.17396235 0.17050743 0.1736331 0.17472339
0.17230892 0.17626953 0.17193913 0.17061377]
mean value: 0.17319025993347167
key: score_time
value: [0.02014542 0.02008057 0.02028108 0.01903844 0.01922488 0.01906919
0.02035928 0.01956439 0.02045226 0.01898384]
mean value: 0.019719934463500975
key: test_mcc
value: [0.57551157 0.48038446 0.54772256 0.56950711 0.72689655 0.70301836
0.67811839 0.67866682 0.65539112 0.61028941]
mean value: 0.6225506347313268
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.78409091 0.73863636 0.77272727 0.78409091 0.86206897 0.85057471
0.83908046 0.83908046 0.82758621 0.8045977 ]
mean value: 0.810253396029258
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.75268817 0.7826087 0.79120879 0.85365854 0.85393258
0.8372093 0.84444444 0.82758621 0.81318681]
mean value: 0.8156523546612395
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.74509804 0.71428571 0.75 0.76595745 0.8974359 0.82608696
0.8372093 0.82608696 0.8372093 0.78723404]
mean value: 0.7986603657993642
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.86363636 0.79545455 0.81818182 0.81818182 0.81395349 0.88372093
0.8372093 0.86363636 0.81818182 0.84090909]
mean value: 0.8353065539112051
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78409091 0.73863636 0.77272727 0.78409091 0.8615222 0.85095137
0.8390592 0.83879493 0.82769556 0.80417548]
mean value: 0.8101744186046512
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.60344828 0.64285714 0.65454545 0.74468085 0.74509804
0.72 0.73076923 0.70588235 0.68518519]
mean value: 0.6899133199106442
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01173329 0.01363635 0.01289082 0.01202559 0.0125587 0.01259565
0.0118103 0.01261592 0.01333976 0.01175308]
mean value: 0.012495946884155274
key: score_time
value: [0.00911117 0.00918341 0.00908852 0.00974941 0.00897264 0.0091536
0.00952911 0.00908566 0.0098989 0.00913501]
mean value: 0.009290742874145507
key: test_mcc
value: [0.62155249 0.33071891 0.46225016 0.46225016 0.41045404 0.61371748
0.5404983 0.44952813 0.42577098 0.49974958]
mean value: 0.48164902399471576
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.80681818 0.65909091 0.72727273 0.72727273 0.70114943 0.8045977
0.77011494 0.72413793 0.71264368 0.74712644]
mean value: 0.7380224660397074
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82105263 0.7 0.75 0.75 0.72340426 0.81318681
0.76190476 0.73913043 0.72527473 0.73170732]
mean value: 0.7515660939120177
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.76470588 0.625 0.69230769 0.69230769 0.66666667 0.77083333
0.7804878 0.70833333 0.70212766 0.78947368]
mean value: 0.7192243748964702
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.88636364 0.79545455 0.81818182 0.81818182 0.79069767 0.86046512
0.74418605 0.77272727 0.75 0.68181818]
mean value: 0.7918076109936575
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.80681818 0.65909091 0.72727273 0.72727273 0.70216702 0.80523256
0.7698203 0.72357294 0.7122093 0.74788584]
mean value: 0.7381342494714588
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.69642857 0.53846154 0.6 0.6 0.56666667 0.68518519
0.61538462 0.5862069 0.56896552 0.57692308]
mean value: 0.6034222067842757
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.66490698 2.55711341 2.59944248 2.58720875 2.57398891 2.7415657
2.7710402 2.6602118 2.64705348 2.56468678]
mean value: 2.6367218494415283
key: score_time
value: [0.09884381 0.09871507 0.10496402 0.10470939 0.10334086 0.10789132
0.10715008 0.10783505 0.0989809 0.09773517]
mean value: 0.1030165672302246
key: test_mcc
value: [0.90909091 0.78582528 0.75488987 0.75488987 0.83923862 0.84118687
0.87056589 0.81606765 0.79323121 0.86289151]
mean value: 0.8227877669037194
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95454545 0.88636364 0.875 0.875 0.91954023 0.91954023
0.93103448 0.90804598 0.89655172 0.93103448]
mean value: 0.9096656217345872
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95454545 0.89583333 0.88172043 0.88172043 0.91764706 0.92134831
0.93478261 0.90909091 0.8988764 0.93333333]
mean value: 0.9128898277138389
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95454545 0.82692308 0.83673469 0.83673469 0.92857143 0.89130435
0.87755102 0.90909091 0.88888889 0.91304348]
mean value: 0.886338799226998
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.97727273 0.93181818 0.93181818 0.90697674 0.95348837
1. 0.90909091 0.90909091 0.95454545]
mean value: 0.9428646934460888
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95454545 0.88636364 0.875 0.875 0.91939746 0.919926
0.93181818 0.90803383 0.89640592 0.9307611 ]
mean value: 0.9097251585623679
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.91304348 0.81132075 0.78846154 0.78846154 0.84782609 0.85416667
0.87755102 0.83333333 0.81632653 0.875 ]
mean value: 0.8405490947877857
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.64
Accuracy on Blind test: 0.82
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...05', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.07194018 1.12413049 1.11955142 1.11322117 1.15317225 1.10507035
1.20874882 1.12016249 1.10679674 1.14551282]
mean value: 1.1268306732177735
key: score_time
value: [0.21674871 0.26874399 0.25944543 0.24885511 0.28227925 0.21746516
0.27740693 0.26276541 0.28292441 0.24287367]
mean value: 0.25595080852508545
key: test_mcc
value: [0.91003151 0.8057162 0.73029674 0.77594029 0.81683533 0.77008457
0.82421385 0.83923862 0.77008457 0.86289151]
mean value: 0.8105333182128026
key: train_mcc
value: [0.90897389 0.90882071 0.90609005 0.90897389 0.91388817 0.91140754
0.90620623 0.91910899 0.90621167 0.91402422]
mean value: 0.9103705353608575
key: test_accuracy
value: [0.95454545 0.89772727 0.86363636 0.88636364 0.90804598 0.88505747
0.90804598 0.91954023 0.88505747 0.93103448]
mean value: 0.903905433646813
key: train_accuracy
value: [0.95419847 0.95419847 0.95292621 0.95419847 0.95679797 0.95552732
0.95298602 0.95933926 0.95298602 0.95679797]
mean value: 0.9549956190125157
key: test_fscore
value: [0.95555556 0.90526316 0.86956522 0.89130435 0.9047619 0.88372093
0.91304348 0.92134831 0.88636364 0.93333333]
mean value: 0.9064259876226728
key: train_fscore
value: [0.955 0.95488722 0.95345912 0.955 0.95739348 0.95619524
0.95357591 0.95989975 0.95345912 0.95739348]
mean value: 0.9556263327547102
key: test_precision
value: [0.93478261 0.84313725 0.83333333 0.85416667 0.92682927 0.88372093
0.85714286 0.91111111 0.88636364 0.91304348]
mean value: 0.8843631145001328
key: train_precision
value: [0.93857494 0.94074074 0.94278607 0.93857494 0.94554455 0.94320988
0.94292804 0.94567901 0.94278607 0.94320988]
mean value: 0.9424034116783878
key: test_recall
value: [0.97727273 0.97727273 0.90909091 0.93181818 0.88372093 0.88372093
0.97674419 0.93181818 0.88636364 0.95454545]
mean value: 0.9312367864693446
key: train_recall
value: [0.97201018 0.96946565 0.96437659 0.97201018 0.96954315 0.96954315
0.96446701 0.97455471 0.96437659 0.97201018]
mean value: 0.9692357370739205
key: test_roc_auc
value: [0.95454545 0.89772727 0.86363636 0.88636364 0.90776956 0.88504228
0.90882664 0.91939746 0.88504228 0.9307611 ]
mean value: 0.9039112050739958
key: train_roc_auc
value: [0.95419847 0.95419847 0.95292621 0.95419847 0.95678175 0.95550949
0.95297142 0.95935857 0.95300048 0.95681727]
mean value: 0.954996060500381
key: test_jcc
value: [0.91489362 0.82692308 0.76923077 0.80392157 0.82608696 0.79166667
0.84 0.85416667 0.79591837 0.875 ]
mean value: 0.8297807689004585
key: train_jcc
value: [0.9138756 0.91366906 0.91105769 0.9138756 0.91826923 0.91606715
0.91127098 0.92289157 0.91105769 0.91826923]
mean value: 0.915030380283576
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02789688 0.01565957 0.0157845 0.01578856 0.01591611 0.01583362
0.01590157 0.01752901 0.01769686 0.0173285 ]
mean value: 0.017533516883850096
key: score_time
value: [0.01299381 0.01210165 0.0122776 0.01220608 0.01224947 0.01215458
0.01221299 0.01325798 0.01327944 0.01305866]
mean value: 0.012579226493835449
key: test_mcc
value: [0.43463356 0.21410373 0.43463356 0.48342972 0.51718675 0.4070455
0.40221987 0.47273749 0.37964137 0.26461585]
mean value: 0.4010247399358443
key: train_mcc
value: [0.47703926 0.49682698 0.44056884 0.47587923 0.46007909 0.46468129
0.45642617 0.47568224 0.45667135 0.49463369]
mean value: 0.4698488137783724
key: test_accuracy
value: [0.71590909 0.60227273 0.71590909 0.73863636 0.75862069 0.70114943
0.70114943 0.73563218 0.68965517 0.63218391]
mean value: 0.6991118077324974
key: train_accuracy
value: [0.73791349 0.7480916 0.72010178 0.73664122 0.72935197 0.73189327
0.72808132 0.73697586 0.72808132 0.74587039]
mean value: 0.7343002221209153
key: test_fscore
value: [0.7311828 0.65346535 0.69879518 0.75789474 0.75294118 0.7173913
0.69767442 0.72941176 0.7032967 0.65217391]
mean value: 0.7094227340267705
key: train_fscore
value: [0.74692875 0.75434243 0.72568579 0.7496977 0.73992674 0.7404674
0.73316708 0.74725275 0.73383085 0.75845411]
mean value: 0.7429753592965128
key: test_precision
value: [0.69387755 0.57894737 0.74358974 0.70588235 0.76190476 0.67346939
0.69767442 0.75609756 0.68085106 0.625 ]
mean value: 0.6917294209042293
key: train_precision
value: [0.72209026 0.73607748 0.71149144 0.71428571 0.71294118 0.71837709
0.72058824 0.71830986 0.71776156 0.72183908]
mean value: 0.7193761896813866
key: test_recall
value: [0.77272727 0.75 0.65909091 0.81818182 0.74418605 0.76744186
0.69767442 0.70454545 0.72727273 0.68181818]
mean value: 0.732293868921776
key: train_recall
value: [0.7735369 0.7735369 0.74045802 0.78880407 0.76903553 0.76395939
0.74619289 0.77862595 0.75063613 0.79898219]
mean value: 0.7683767969930639
key: test_roc_auc
value: [0.71590909 0.60227273 0.71590909 0.73863636 0.75845666 0.70190275
0.70110994 0.73599366 0.68921776 0.63160677]
mean value: 0.6991014799154334
key: train_roc_auc
value: [0.73791349 0.7480916 0.72010178 0.73664122 0.72930148 0.73185247
0.72805828 0.73702871 0.72810994 0.74593779]
mean value: 0.7343036772968574
key: test_jcc
value: [0.57627119 0.48529412 0.53703704 0.61016949 0.60377358 0.55932203
0.53571429 0.57407407 0.54237288 0.48387097]
mean value: 0.5507899660340391
key: train_jcc
value: [0.59607843 0.60557769 0.56947162 0.59961315 0.5872093 0.58789062
0.57874016 0.59649123 0.57956778 0.61089494]
mean value: 0.5911534932157384
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.1539228 0.13720822 0.14518332 0.12025738 0.13158059 0.13478374
0.12884831 0.12683439 0.12769008 0.13258982]
mean value: 0.13388986587524415
key: score_time
value: [0.01118755 0.01118207 0.0112083 0.01213813 0.01123714 0.01117682
0.01118374 0.0114069 0.01129794 0.01113939]
mean value: 0.01131579875946045
key: test_mcc
value: [0.93205893 0.82589664 0.79730996 0.82158384 0.79334038 0.81702814
0.87056589 0.83923862 0.81702814 0.86205074]
mean value: 0.8376101275955953
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96590909 0.90909091 0.89772727 0.90909091 0.89655172 0.90804598
0.93103448 0.91954023 0.90804598 0.93103448]
mean value: 0.9176071055381401
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.96551724 0.91489362 0.9010989 0.91304348 0.89655172 0.90909091
0.93478261 0.92134831 0.90697674 0.93181818]
mean value: 0.919512172029582
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.97674419 0.86 0.87234043 0.875 0.88636364 0.88888889
0.87755102 0.91111111 0.92857143 0.93181818]
mean value: 0.9008388878739837
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.97727273 0.93181818 0.95454545 0.90697674 0.93023256
1. 0.93181818 0.88636364 0.93181818]
mean value: 0.9405391120507399
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.96590909 0.90909091 0.89772727 0.90909091 0.89667019 0.9082981
0.93181818 0.91939746 0.9082981 0.93102537]
mean value: 0.9177325581395349
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.93333333 0.84313725 0.82 0.84 0.8125 0.83333333
0.87755102 0.85416667 0.82978723 0.87234043]
mean value: 0.8516149268217925
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.87
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.08155274 0.07502103 0.05325723 0.0756371 0.05765128 0.13894176
0.07617521 0.08730435 0.1101017 0.07325125]
mean value: 0.08288936614990235
key: score_time
value: [0.01919389 0.01230693 0.01228714 0.01243138 0.01909542 0.03655434
0.01920414 0.02307391 0.01900601 0.01966429]
mean value: 0.019281744956970215
key: test_mcc
value: [0.6882472 0.37796447 0.45643546 0.59648091 0.65994555 0.5504913
0.67803941 0.70984404 0.51803019 0.38309043]
mean value: 0.5618568970485822
key: train_mcc
value: [0.7521962 0.74937604 0.79308611 0.76606427 0.77295317 0.75613181
0.74697508 0.76400062 0.765522 0.76638343]
mean value: 0.7632688717792019
key: test_accuracy
value: [0.84090909 0.68181818 0.72727273 0.79545455 0.82758621 0.77011494
0.82758621 0.85057471 0.75862069 0.68965517]
mean value: 0.7769592476489028
key: train_accuracy
value: [0.8740458 0.8740458 0.8956743 0.88167939 0.88564168 0.87674714
0.87166455 0.88055909 0.88055909 0.88182973]
mean value: 0.8802446563268895
key: test_fscore
value: [0.85106383 0.72 0.73913043 0.80851064 0.83516484 0.78723404
0.84536082 0.86315789 0.76923077 0.71578947]
mean value: 0.7934642742979832
key: train_fscore
value: [0.88029021 0.8776267 0.89901478 0.88644689 0.88943489 0.8818514
0.87787183 0.88536585 0.88647343 0.88644689]
mean value: 0.8850822856062937
key: test_precision
value: [0.8 0.64285714 0.70833333 0.76 0.79166667 0.7254902
0.75925926 0.80392157 0.74468085 0.66666667]
mean value: 0.7402875684552781
key: train_precision
value: [0.83870968 0.85336538 0.87112172 0.85211268 0.86190476 0.84777518
0.83833718 0.8501171 0.84367816 0.85211268]
mean value: 0.8509234509459607
key: test_recall
value: [0.90909091 0.81818182 0.77272727 0.86363636 0.88372093 0.86046512
0.95348837 0.93181818 0.79545455 0.77272727]
mean value: 0.8561310782241015
key: train_recall
value: [0.92620865 0.90330789 0.92875318 0.92366412 0.91878173 0.91878173
0.9213198 0.92366412 0.93384224 0.92366412]
mean value: 0.922198757443071
key: test_roc_auc
value: [0.84090909 0.68181818 0.72727273 0.79545455 0.8282241 0.77114165
0.82901691 0.84963002 0.75819239 0.68868922]
mean value: 0.7770348837209302
key: train_roc_auc
value: [0.8740458 0.8740458 0.8956743 0.88167939 0.88559951 0.87669366
0.87160137 0.88061379 0.8806267 0.88188282]
mean value: 0.8802463155991269
key: test_jcc
value: [0.74074074 0.5625 0.5862069 0.67857143 0.71698113 0.64912281
0.73214286 0.75925926 0.625 0.55737705]
mean value: 0.6607902170539353
key: train_jcc
value: [0.78617711 0.78193833 0.81655481 0.79605263 0.80088496 0.78867102
0.78232759 0.79431072 0.79609544 0.79605263]
mean value: 0.7939065237534392
MCC on Blind test: 0.51
Accuracy on Blind test: 0.76
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01928306 0.01527429 0.01542139 0.01770949 0.01664853 0.01519966
0.01514888 0.01584053 0.01532054 0.01538372]
mean value: 0.016123008728027344
key: score_time
value: [0.01221609 0.01213503 0.01222253 0.01570082 0.01304436 0.01203132
0.01204228 0.01213694 0.01216316 0.01246881]
mean value: 0.012616133689880371
key: test_mcc
value: [0.60092521 0.25819889 0.43192975 0.59648091 0.51718675 0.44820296
0.42547569 0.42577098 0.38309043 0.33458714]
mean value: 0.4421848698943361
key: train_mcc
value: [0.46296406 0.49560803 0.45453146 0.4500587 0.45896356 0.45870907
0.47255984 0.47612264 0.44425601 0.4758961 ]
mean value: 0.4649669477593169
key: test_accuracy
value: [0.79545455 0.625 0.71590909 0.79545455 0.75862069 0.72413793
0.71264368 0.71264368 0.68965517 0.66666667]
mean value: 0.7196185997910136
key: train_accuracy
value: [0.7302799 0.74681934 0.7264631 0.72391858 0.72808132 0.72808132
0.73570521 0.73697586 0.72172808 0.73697586]
mean value: 0.7315028565331678
key: test_fscore
value: [0.8125 0.66666667 0.71264368 0.80851064 0.75294118 0.72093023
0.71264368 0.72527473 0.71578947 0.68817204]
mean value: 0.7316072312284795
key: train_fscore
value: [0.7433414 0.75761267 0.73748474 0.7369697 0.74278846 0.74216867
0.74509804 0.74848117 0.72929543 0.74786845]
mean value: 0.7431108727766951
key: test_precision
value: [0.75 0.6 0.72093023 0.76 0.76190476 0.72093023
0.70454545 0.70212766 0.66666667 0.65306122]
mean value: 0.7040166232297426
key: train_precision
value: [0.70900693 0.72663551 0.70892019 0.7037037 0.70547945 0.70642202
0.72037915 0.71627907 0.70913462 0.71728972]
mean value: 0.7123250356023364
key: test_recall
value: [0.88636364 0.75 0.70454545 0.86363636 0.74418605 0.72093023
0.72093023 0.75 0.77272727 0.72727273]
mean value: 0.7640591966173361
key: train_recall
value: [0.78117048 0.7913486 0.76844784 0.7735369 0.78426396 0.78172589
0.7715736 0.78371501 0.75063613 0.78117048]
mean value: 0.7767588897069271
key: test_roc_auc
value: [0.79545455 0.625 0.71590909 0.79545455 0.75845666 0.72410148
0.71273784 0.7122093 0.68868922 0.66596195]
mean value: 0.7193974630021142
key: train_roc_auc
value: [0.7302799 0.74681934 0.7264631 0.72391858 0.72800984 0.72801307
0.73565958 0.73703517 0.72176477 0.73703194]
mean value: 0.731499528551685
key: test_jcc
value: [0.68421053 0.5 0.55357143 0.67857143 0.60377358 0.56363636
0.55357143 0.56896552 0.55737705 0.52459016]
mean value: 0.5788267490928233
key: train_jcc
value: [0.59152216 0.60980392 0.58413926 0.58349328 0.59082218 0.59003831
0.59375 0.59805825 0.57392996 0.59727626]
mean value: 0.5912833598721492
MCC on Blind test: 0.29
Accuracy on Blind test: 0.66
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02707696 0.02653337 0.02409434 0.02513218 0.03793049 0.03685093
0.02607822 0.0229454 0.02661228 0.04135823]
mean value: 0.029461240768432616
key: score_time
value: [0.01222515 0.01221347 0.01216626 0.012182 0.02011037 0.01222014
0.01242065 0.01490355 0.01259971 0.01238799]
mean value: 0.013342928886413575
key: test_mcc
value: [0.50709255 0.36363636 0.50709255 0.48758163 0.67900591 0.27128229
0.50648727 0.55996332 0.5606067 0.52749822]
mean value: 0.4970246803472478
key: train_mcc
value: [0.54495505 0.62333053 0.4455788 0.5875648 0.71349834 0.45128891
0.61269854 0.55047307 0.63666649 0.72772028]
mean value: 0.58937748073151
key: test_accuracy
value: [0.70454545 0.68181818 0.70454545 0.73863636 0.83908046 0.59770115
0.74712644 0.75862069 0.77011494 0.75862069]
mean value: 0.7300809822361547
key: train_accuracy
value: [0.73536896 0.80788804 0.67048346 0.78371501 0.85260483 0.67217281
0.79923761 0.74205845 0.79923761 0.85768742]
mean value: 0.7720454200089883
key: test_fscore
value: [0.77192982 0.68181818 0.77192982 0.70886076 0.84090909 0.36363636
0.71052632 0.8 0.8 0.78350515]
mean value: 0.7233115515408763
key: train_fscore
value: [0.78861789 0.79172414 0.75072185 0.75146199 0.86320755 0.51685393
0.77556818 0.79136691 0.82826087 0.86946387]
mean value: 0.7727247167420862
key: test_precision
value: [0.62857143 0.68181818 0.62857143 0.8 0.82222222 0.83333333
0.81818182 0.68852459 0.71428571 0.71698113]
mean value: 0.7332489849223534
key: train_precision
value: [0.65651438 0.86445783 0.60371517 0.88316151 0.8061674 0.98571429
0.88064516 0.6637931 0.72296015 0.80215054]
mean value: 0.7869279536805144
key: test_recall
value: [1. 0.68181818 1. 0.63636364 0.86046512 0.23255814
0.62790698 0.95454545 0.90909091 0.86363636]
mean value: 0.7766384778012685
key: train_recall
value: [0.98727735 0.7302799 0.99236641 0.65394402 0.92893401 0.35025381
0.6928934 0.97964377 0.96946565 0.94910941]
mean value: 0.8234167732269023
key: test_roc_auc
value: [0.70454545 0.68181818 0.70454545 0.73863636 0.83932347 0.5935518
0.74577167 0.75634249 0.76849894 0.75739958]
mean value: 0.7290433403805496
key: train_roc_auc
value: [0.73536896 0.80788804 0.67048346 0.78371501 0.85250772 0.67258237
0.79937291 0.74235995 0.79945364 0.85780344]
mean value: 0.7721535500703943
key: test_jcc
value: [0.62857143 0.51724138 0.62857143 0.54901961 0.7254902 0.22222222
0.55102041 0.66666667 0.66666667 0.6440678 ]
mean value: 0.5799537800703761
key: train_jcc
value: [0.65100671 0.65525114 0.6009245 0.60187354 0.7593361 0.34848485
0.63341067 0.6547619 0.70686456 0.76907216]
mean value: 0.6380986143132776
MCC on Blind test: 0.43
Accuracy on Blind test: 0.73
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03708172 0.03154826 0.02611279 0.050071 0.04132271 0.03019166
0.04098511 0.0399518 0.02971935 0.03148985]
mean value: 0.03584742546081543
key: score_time
value: [0.01232719 0.01233745 0.01214862 0.01300573 0.01256013 0.0125978
0.01266861 0.01263452 0.01259875 0.01262116]
mean value: 0.012549996376037598
key: test_mcc
value: [0.56694671 0.54312363 0.47140452 0.60678804 0.63521 0.15516639
0.5751254 0.50394847 0.5633473 0.36454131]
mean value: 0.49856017625519344
key: train_mcc
value: [0.6109196 0.69300077 0.41316998 0.76800614 0.7421714 0.19534962
0.59538036 0.55886122 0.64229522 0.3612683 ]
mean value: 0.5580422606977329
key: test_accuracy
value: [0.77272727 0.73863636 0.68181818 0.79545455 0.81609195 0.52873563
0.74712644 0.73563218 0.77011494 0.62068966]
mean value: 0.7207027168234065
key: train_accuracy
value: [0.78880407 0.8307888 0.6475827 0.8778626 0.8678526 0.53621347
0.76747141 0.76620076 0.81448539 0.61880559]
mean value: 0.7516067392843633
key: test_fscore
value: [0.73684211 0.78899083 0.75862069 0.81632653 0.82222222 0.08888889
0.7962963 0.68493151 0.73684211 0.72727273]
mean value: 0.6957233898011256
key: train_fscore
value: [0.74772036 0.85271318 0.73892554 0.88785047 0.87619048 0.13711584
0.80957336 0.72372372 0.79320113 0.72273567]
mean value: 0.7289749760328404
key: test_precision
value: [0.875 0.66153846 0.61111111 0.74074074 0.78723404 1.
0.66153846 0.86206897 0.875 0.57142857]
mean value: 0.7645660354427779
key: train_precision
value: [0.92830189 0.75490196 0.58682635 0.82073434 0.82511211 1.
0.68606702 0.88278388 0.89456869 0.56748911]
mean value: 0.7946785350697182
key: test_recall
value: [0.63636364 0.97727273 1. 0.90909091 0.86046512 0.04651163
1. 0.56818182 0.63636364 1. ]
mean value: 0.7634249471458774
key: train_recall
value: [0.6259542 0.97964377 0.99745547 0.96692112 0.93401015 0.07360406
0.98730964 0.61323155 0.71246819 0.99491094]
mean value: 0.78855090995983
key: test_roc_auc
value: [0.77272727 0.73863636 0.68181818 0.79545455 0.81659619 0.52325581
0.75 0.73757928 0.77167019 0.61627907]
mean value: 0.7204016913319239
key: train_roc_auc
value: [0.78880407 0.8307888 0.6475827 0.8778626 0.86776843 0.53680203
0.76719172 0.76600664 0.81435592 0.61928288]
mean value: 0.751644579636016
key: test_jcc
value: [0.58333333 0.65151515 0.61111111 0.68965517 0.69811321 0.04651163
0.66153846 0.52083333 0.58333333 0.57142857]
mean value: 0.5617373303461235
key: train_jcc
value: [0.59708738 0.74324324 0.58594918 0.79831933 0.77966102 0.07360406
0.68006993 0.56705882 0.657277 0.5658466 ]
mean value: 0.6048116553391599
MCC on Blind test: 0.4
Accuracy on Blind test: 0.71
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.26197743 0.24748087 0.24301672 0.24210167 0.24358678 0.24519014
0.24583149 0.24984431 0.24517965 0.24366021]
mean value: 0.2467869281768799
key: score_time
value: [0.0159831 0.01668954 0.01640892 0.016747 0.01565933 0.01589561
0.01652861 0.01699233 0.01569104 0.01592636]
mean value: 0.01625218391418457
key: test_mcc
value: [0.88843109 0.80064077 0.82158384 0.80064077 0.77077916 0.7951307
0.75739672 0.79323121 0.7951307 0.79480784]
mean value: 0.8017772796644179
key: train_mcc
value: [0.88584325 0.89323388 0.90609005 0.89840499 0.91129031 0.88577908
0.8764911 0.8810448 0.90347852 0.88328576]
mean value: 0.8924941735039372
key: test_accuracy
value: [0.94318182 0.89772727 0.90909091 0.89772727 0.88505747 0.89655172
0.87356322 0.89655172 0.89655172 0.89655172]
mean value: 0.899255485893417
key: train_accuracy
value: [0.94274809 0.94656489 0.95292621 0.94910941 0.95552732 0.94282084
0.93773825 0.94027954 0.95171537 0.94155019]
mean value: 0.9460980112580062
key: test_fscore
value: [0.94117647 0.90322581 0.91304348 0.90322581 0.88095238 0.8988764
0.88172043 0.8988764 0.89411765 0.9010989 ]
mean value: 0.9016313729958727
key: train_fscore
value: [0.94353827 0.9469697 0.95345912 0.94962217 0.95608532 0.94339623
0.93928129 0.94117647 0.95189873 0.94206549]
mean value: 0.9467492782258208
key: test_precision
value: [0.97560976 0.85714286 0.875 0.85714286 0.90243902 0.86956522
0.82 0.88888889 0.92682927 0.87234043]
mean value: 0.884495829487831
key: train_precision
value: [0.93069307 0.93984962 0.94278607 0.94014963 0.94540943 0.93516209
0.91767554 0.92610837 0.94710327 0.93266833]
mean value: 0.9357605435912151
key: test_recall
value: [0.90909091 0.95454545 0.95454545 0.95454545 0.86046512 0.93023256
0.95348837 0.90909091 0.86363636 0.93181818]
mean value: 0.9221458773784356
key: train_recall
value: [0.956743 0.95419847 0.96437659 0.95928753 0.96700508 0.95177665
0.96192893 0.956743 0.956743 0.95165394]
mean value: 0.9580456206972269
key: test_roc_auc
value: [0.94318182 0.89772727 0.90909091 0.89772727 0.88477801 0.89693446
0.87447146 0.89640592 0.89693446 0.89614165]
mean value: 0.8993393234672304
key: train_roc_auc
value: [0.94274809 0.94656489 0.95292621 0.94910941 0.95551272 0.94280944
0.93770747 0.94030044 0.95172176 0.94156301]
mean value: 0.9460963433693701
key: test_jcc
value: [0.88888889 0.82352941 0.84 0.82352941 0.78723404 0.81632653
0.78846154 0.81632653 0.80851064 0.82 ]
mean value: 0.8212806992955393
key: train_jcc
value: [0.89311164 0.89928058 0.91105769 0.90407674 0.91586538 0.89285714
0.88551402 0.88888889 0.90821256 0.89047619]
mean value: 0.8989340831326912
MCC on Blind test: 0.57
Accuracy on Blind test: 0.79
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.21937823 0.11462379 0.13031578 0.22587872 0.22438908 0.23684406
0.24717188 0.22145414 0.2242465 0.24388313]
mean value: 0.20881853103637696
key: score_time
value: [0.0309608 0.03981233 0.03937268 0.03936577 0.03722668 0.04367852
0.03383708 0.0447216 0.01878834 0.03858972]
mean value: 0.03663535118103027
key: test_mcc
value: [0.91003151 0.82158384 0.75488987 0.77352678 0.79334038 0.81702814
0.84118687 0.90904296 0.79334038 0.81683533]
mean value: 0.8230806070229827
key: train_mcc
value: [0.98728055 0.98987316 0.98987316 0.98728055 0.98229048 0.9847522
0.9747008 0.99240487 0.98732207 0.99746191]
mean value: 0.9873239747455131
key: test_accuracy
value: [0.95454545 0.90909091 0.875 0.88636364 0.89655172 0.90804598
0.91954023 0.95402299 0.89655172 0.90804598]
mean value: 0.9107758620689655
key: train_accuracy
value: [0.99363868 0.99491094 0.99491094 0.99363868 0.99110546 0.99237611
0.98729352 0.99618806 0.99364676 0.99872935]
mean value: 0.9936438499665363
key: test_fscore
value: [0.95348837 0.91304348 0.88172043 0.88888889 0.89655172 0.90909091
0.92134831 0.95348837 0.89655172 0.91111111]
mean value: 0.9125283324527956
key: train_fscore
value: [0.99363057 0.99488491 0.99488491 0.99364676 0.99106003 0.99238579
0.98721228 0.99616858 0.9936143 0.99872611]
mean value: 0.9936214243611737
key: test_precision
value: [0.97619048 0.875 0.83673469 0.86956522 0.88636364 0.88888889
0.89130435 0.97619048 0.90697674 0.89130435]
mean value: 0.8998518828740554
key: train_precision
value: [0.99489796 1. 1. 0.99238579 0.99742931 0.99238579
0.99484536 1. 0.9974359 1. ]
mean value: 0.996938009696097
key: test_recall
value: [0.93181818 0.95454545 0.93181818 0.90909091 0.90697674 0.93023256
0.95348837 0.93181818 0.88636364 0.93181818]
mean value: 0.9267970401691332
key: train_recall
value: [0.99236641 0.98982188 0.98982188 0.99491094 0.98477157 0.99238579
0.97969543 0.99236641 0.98982188 0.99745547]
mean value: 0.9903417677374355
key: test_roc_auc
value: [0.95454545 0.90909091 0.875 0.88636364 0.89667019 0.9082981
0.919926 0.95428118 0.89667019 0.90776956]
mean value: 0.9108615221987315
key: train_roc_auc
value: [0.99363868 0.99491094 0.99491094 0.99363868 0.99111352 0.9923761
0.98730319 0.99618321 0.99364191 0.99872774]
mean value: 0.9936444892212708
key: test_jcc
value: [0.91111111 0.84 0.78846154 0.8 0.8125 0.83333333
0.85416667 0.91111111 0.8125 0.83673469]
mean value: 0.8399918454561311
key: train_jcc
value: [0.98734177 0.98982188 0.98982188 0.98737374 0.98227848 0.98488665
0.97474747 0.99236641 0.98730964 0.99745547]
mean value: 0.9873403408684838
MCC on Blind test: 0.77
Accuracy on Blind test: 0.89
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.31891465 0.44751048 0.38999581 0.35558724 0.33960152 0.33973193
0.40417552 0.29927397 0.2746563 0.30698133]
mean value: 0.3476428747177124
key: score_time
value: [0.01942086 0.03408337 0.01951623 0.03345537 0.01954579 0.01944494
0.034302 0.0343461 0.01945758 0.03698587]
mean value: 0.027055811882019044
key: test_mcc
value: [0.58681566 0.39903465 0.54772256 0.50847518 0.68133961 0.61371748
0.51879367 0.63261064 0.52312769 0.46459728]
mean value: 0.547623442308813
key: train_mcc
value: [0.92779891 0.93263014 0.91760331 0.9253913 0.91502105 0.92279174
0.9198622 0.92253952 0.92789391 0.92012314]
mean value: 0.9231655222061333
key: test_accuracy
value: [0.78409091 0.69318182 0.77272727 0.75 0.83908046 0.8045977
0.75862069 0.81609195 0.75862069 0.72413793]
mean value: 0.7701149425287356
key: train_accuracy
value: [0.96310433 0.96564885 0.95801527 0.96183206 0.95679797 0.96060991
0.95933926 0.96060991 0.96315121 0.95933926]
mean value: 0.9608448031142193
key: test_fscore
value: [0.80808081 0.72727273 0.7826087 0.77083333 0.84444444 0.81318681
0.76404494 0.82222222 0.77894737 0.76 ]
mean value: 0.7871641356433801
key: train_fscore
value: [0.96415328 0.96654275 0.9592089 0.96296296 0.95802469 0.96177559
0.96039604 0.96158612 0.96415328 0.96039604]
mean value: 0.9619199642766659
key: test_precision
value: [0.72727273 0.65454545 0.75 0.71153846 0.80851064 0.77083333
0.73913043 0.80434783 0.7254902 0.67857143]
mean value: 0.7370240500507275
key: train_precision
value: [0.9375 0.94202899 0.93269231 0.9352518 0.93269231 0.9352518
0.93719807 0.93719807 0.9375 0.93493976]
mean value: 0.9362253092316009
key: test_recall
value: [0.90909091 0.81818182 0.81818182 0.84090909 0.88372093 0.86046512
0.79069767 0.84090909 0.84090909 0.86363636]
mean value: 0.8466701902748415
key: train_recall
value: [0.99236641 0.99236641 0.98727735 0.99236641 0.98477157 0.98984772
0.98477157 0.98727735 0.99236641 0.98727735]
mean value: 0.9890688572867826
key: test_roc_auc
value: [0.78409091 0.69318182 0.77272727 0.75 0.83958774 0.80523256
0.7589852 0.81580338 0.75766385 0.72251586]
mean value: 0.7699788583509514
key: train_roc_auc
value: [0.96310433 0.96564885 0.95801527 0.96183206 0.95676238 0.96057271
0.95930691 0.96064375 0.96318828 0.95937472]
mean value: 0.9608449257953269
key: test_jcc
value: [0.6779661 0.57142857 0.64285714 0.62711864 0.73076923 0.68518519
0.61818182 0.69811321 0.63793103 0.61290323]
mean value: 0.6502454162021041
key: train_jcc
value: [0.93078759 0.9352518 0.9216152 0.92857143 0.91943128 0.9263658
0.92380952 0.92601432 0.93078759 0.92380952]
mean value: 0.9266444050803866
MCC on Blind test: 0.36
Accuracy on Blind test: 0.69
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.08145165 1.06318831 1.06463647 1.06263065 1.06868625 1.07411194
1.07155848 1.07330966 1.06570554 1.06897855]
mean value: 1.0694257497787476
key: score_time
value: [0.00957036 0.00953889 0.009655 0.00960064 0.01019549 0.00988364
0.00950861 0.00972033 0.00976014 0.00954604]
mean value: 0.009697914123535156
key: test_mcc
value: [0.93205893 0.86722738 0.81902836 0.80064077 0.83932347 0.79862977
0.86585804 0.90803383 0.83932347 0.86205074]
mean value: 0.8532174747936125
key: train_mcc
value: [0.94910941 0.96438908 0.94912171 0.95165702 0.96190808 0.96190882
0.96190882 0.9644218 0.95935185 0.96190808]
mean value: 0.9585684683799411
key: test_accuracy
value: [0.96590909 0.93181818 0.90909091 0.89772727 0.91954023 0.89655172
0.93103448 0.95402299 0.91954023 0.93103448]
mean value: 0.9256269592476489
key: train_accuracy
value: [0.97455471 0.9821883 0.97455471 0.97582697 0.98094028 0.98094028
0.98094028 0.98221093 0.97966963 0.98094028]
mean value: 0.9792766359189242
key: test_fscore
value: [0.96629213 0.93478261 0.91111111 0.90322581 0.91954023 0.9010989
0.93333333 0.95454545 0.91954023 0.93181818]
mean value: 0.9275287991655823
key: train_fscore
value: [0.97455471 0.98214286 0.9744898 0.97585769 0.98103666 0.98089172
0.98089172 0.9821883 0.97969543 0.98084291]
mean value: 0.9792591788318852
key: test_precision
value: [0.95555556 0.89583333 0.89130435 0.85714286 0.90909091 0.85416667
0.89361702 0.95454545 0.93023256 0.93181818]
mean value: 0.9073306885395176
key: train_precision
value: [0.97455471 0.98465473 0.9769821 0.97461929 0.97732997 0.98465473
0.98465473 0.9821883 0.97721519 0.98461538]
mean value: 0.9801469132744618
key: test_recall
value: [0.97727273 0.97727273 0.93181818 0.95454545 0.93023256 0.95348837
0.97674419 0.95454545 0.90909091 0.93181818]
mean value: 0.9496828752642706
key: train_recall
value: [0.97455471 0.97964377 0.97201018 0.97709924 0.98477157 0.97715736
0.97715736 0.9821883 0.9821883 0.97709924]
mean value: 0.9783870009428967
key: test_roc_auc
value: [0.96590909 0.93181818 0.90909091 0.89772727 0.91966173 0.89719873
0.93155391 0.95401691 0.91966173 0.93102537]
mean value: 0.9257663847780127
key: train_roc_auc
value: [0.97455471 0.9821883 0.97455471 0.97582697 0.98093541 0.98094509
0.98094509 0.9822109 0.97967283 0.98093541]
mean value: 0.9792769403650172
key: test_jcc
value: [0.93478261 0.87755102 0.83673469 0.82352941 0.85106383 0.82
0.875 0.91304348 0.85106383 0.87234043]
mean value: 0.8655109298113325
key: train_jcc
value: [0.95037221 0.96491228 0.95024876 0.9528536 0.96277916 0.9625
0.9625 0.965 0.960199 0.96240602]
mean value: 0.9593771019712535
MCC on Blind test: 0.67
Accuracy on Blind test: 0.84
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03925729 0.0395782 0.04055047 0.04157686 0.04234195 0.0398705
0.04044795 0.04089832 0.04133368 0.04074621]
mean value: 0.04066014289855957
key: score_time
value: [0.01284862 0.01279402 0.0141449 0.01284528 0.01316023 0.01296759
0.01277375 0.01281953 0.01298618 0.01304054]
mean value: 0.013038063049316406
key: test_mcc
value: [-0.10910895 0.09016696 0.20998026 0.03750293 0.21209676 0.15546399
0.24411022 0.22206651 0.24234093 0.05222823]
mean value: 0.13568478555708904
key: train_mcc
value: [0.25503069 0.24932341 0.23759548 0.26064302 0.22239349 0.24365973
0.23473311 0.23110953 0.25453445 0.24595319]
mean value: 0.24349760981536647
key: test_accuracy
value: [0.47727273 0.52272727 0.55681818 0.51136364 0.56321839 0.54022989
0.55172414 0.55172414 0.57471264 0.51724138]
mean value: 0.5367032392894462
key: train_accuracy
value: [0.5610687 0.55852417 0.55343511 0.56361323 0.5476493 0.55654384
0.55273189 0.5501906 0.56035578 0.55654384]
mean value: 0.5560656469150411
key: test_fscore
value: [0.640625 0.66666667 0.688 0.6504065 0.68333333 0.67213115
0.688 0.69291339 0.69918699 0.66666667]
mean value: 0.6747929695969381
key: train_fscore
value: [0.69496021 0.69373345 0.69129288 0.69619132 0.68881119 0.69305189
0.69122807 0.68947368 0.69434629 0.69251101]
mean value: 0.6925599996064771
key: test_precision
value: [0.48809524 0.51219512 0.5308642 0.50632911 0.53246753 0.51898734
0.52439024 0.53012048 0.5443038 0.51219512]
mean value: 0.5199948190990781
key: train_precision
value: [0.53252033 0.53108108 0.52822581 0.53396739 0.52533333 0.53028264
0.52815013 0.52610442 0.53179973 0.5296496 ]
mean value: 0.5297114452098144
key: test_recall
value: [0.93181818 0.95454545 0.97727273 0.90909091 0.95348837 0.95348837
1. 1. 0.97727273 0.95454545]
mean value: 0.9611522198731501
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.47727273 0.52272727 0.55681818 0.51136364 0.56765328 0.544926
0.55681818 0.54651163 0.57003171 0.51215645]
mean value: 0.5366279069767442
key: train_roc_auc
value: [0.5610687 0.55852417 0.55343511 0.56361323 0.54707379 0.55597964
0.55216285 0.55076142 0.56091371 0.5571066 ]
mean value: 0.5560639232249648
key: test_jcc
value: [0.47126437 0.5 0.52439024 0.48192771 0.51898734 0.50617284
0.52439024 0.53012048 0.5375 0.5 ]
mean value: 0.5094753229670379
key: train_jcc
value: [0.53252033 0.53108108 0.52822581 0.53396739 0.52533333 0.53028264
0.52815013 0.52610442 0.53179973 0.5296496 ]
mean value: 0.5297114452098144
MCC on Blind test: 0.06
Accuracy on Blind test: 0.45
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.01993704 0.0179441 0.01770711 0.03821087 0.04329848 0.01797175
0.0178349 0.03925276 0.04720855 0.04208231]
mean value: 0.030144786834716795
key: score_time
value: [0.01368213 0.01220298 0.01225615 0.01899743 0.01320052 0.01226377
0.02703691 0.01902175 0.01903915 0.01890492]
mean value: 0.016660571098327637
key: test_mcc
value: [0.6882472 0.42521003 0.54601891 0.5933661 0.67900591 0.54295079
0.69052856 0.70984404 0.58821234 0.50171077]
mean value: 0.5965094644310827
key: train_mcc
value: [0.72661129 0.72075868 0.7378189 0.74120574 0.71121629 0.71197478
0.72691923 0.74429699 0.72825208 0.74359616]
mean value: 0.72926501243974
key: test_accuracy
value: [0.84090909 0.70454545 0.77272727 0.79545455 0.83908046 0.77011494
0.83908046 0.85057471 0.79310345 0.74712644]
mean value: 0.7952716823406478
key: train_accuracy
value: [0.86132316 0.85877863 0.86768448 0.86895674 0.85387548 0.85387548
0.86149936 0.8703939 0.86149936 0.8703939 ]
mean value: 0.8628280486661429
key: test_fscore
value: [0.85106383 0.74 0.77777778 0.80434783 0.84090909 0.77777778
0.85106383 0.86315789 0.80434783 0.77083333]
mean value: 0.8081279186283203
key: train_fscore
value: [0.86819831 0.86512758 0.87286064 0.87484812 0.86094317 0.86161252
0.86851628 0.87621359 0.86914766 0.87560976]
mean value: 0.8693077616688508
key: test_precision
value: [0.8 0.66071429 0.76086957 0.77083333 0.82222222 0.74468085
0.78431373 0.80392157 0.77083333 0.71153846]
mean value: 0.7629927346540505
key: train_precision
value: [0.82718894 0.82790698 0.84 0.8372093 0.8221709 0.81922197
0.82758621 0.83758701 0.82272727 0.84074941]
mean value: 0.8302347988922448
key: test_recall
value: [0.90909091 0.84090909 0.79545455 0.84090909 0.86046512 0.81395349
0.93023256 0.93181818 0.84090909 0.84090909]
mean value: 0.8604651162790697
key: train_recall
value: [0.91348601 0.90585242 0.90839695 0.91603053 0.9035533 0.90862944
0.91370558 0.91857506 0.92111959 0.91348601]
mean value: 0.9122834889758593
key: test_roc_auc
value: [0.84090909 0.70454545 0.77272727 0.79545455 0.83932347 0.77061311
0.84011628 0.84963002 0.79254757 0.74603594]
mean value: 0.7951902748414376
key: train_roc_auc
value: [0.86132316 0.85877863 0.86768448 0.86895674 0.85381227 0.85380581
0.86143294 0.87045504 0.86157502 0.87044859]
mean value: 0.8628272690871985
key: test_jcc
value: [0.74074074 0.58730159 0.63636364 0.67272727 0.7254902 0.63636364
0.74074074 0.75925926 0.67272727 0.62711864]
mean value: 0.6798832986370374
key: train_jcc
value: [0.76709402 0.76231263 0.77440347 0.7775378 0.75583864 0.75687104
0.76759062 0.77969762 0.76857749 0.77874187]
mean value: 0.7688665198477691
MCC on Blind test: 0.44
Accuracy on Blind test: 0.73
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_ppi2_affinity', 'interface_dist',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=168)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.32210398 0.26895761 0.34120345 0.32642722 0.34458899 0.4479835
0.36319351 0.43385816 0.46567249 0.45032525]
mean value: 0.37643141746520997
key: score_time
value: [0.01219916 0.01898623 0.01927805 0.01902223 0.02566624 0.01927495
0.02244329 0.02526784 0.02273488 0.02529621]
mean value: 0.021016907691955567
key: test_mcc
value: [0.6882472 0.42521003 0.54601891 0.59648091 0.65994555 0.54295079
0.69052856 0.70540345 0.5641598 0.50171077]
mean value: 0.592065596277015
key: train_mcc
value: [0.72661129 0.72075868 0.7378189 0.75824295 0.74898219 0.71197478
0.72691923 0.75450866 0.75617256 0.74359616]
mean value: 0.7385585392437356
key: test_accuracy
value: [0.84090909 0.70454545 0.77272727 0.79545455 0.82758621 0.77011494
0.83908046 0.85057471 0.7816092 0.74712644]
mean value: 0.7929728317659352
key: train_accuracy
value: [0.86132316 0.85877863 0.86768448 0.8778626 0.8729352 0.85387548
0.86149936 0.87547649 0.87547649 0.8703939 ]
mean value: 0.8675305779993598
key: test_fscore
value: [0.85106383 0.74 0.77777778 0.80851064 0.83516484 0.77777778
0.85106383 0.86021505 0.79120879 0.77083333]
mean value: 0.8063615866898297
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_sl.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.86819831 0.86512758 0.87286064 0.88264059 0.87864078 0.86161252
0.86851628 0.88106796 0.88221154 0.87560976]
mean value: 0.8736485943790752
key: test_precision
value: [0.8 0.66071429 0.76086957 0.76 0.79166667 0.74468085
0.78431373 0.81632653 0.76595745 0.71153846]
mean value: 0.7596067533111587
key: train_precision
value: [0.82718894 0.82790698 0.84 0.84941176 0.84186047 0.81922197
0.82758621 0.84222738 0.83599089 0.84074941]
mean value: 0.8352144002611301
key: test_recall
value: [0.90909091 0.84090909 0.79545455 0.86363636 0.88372093 0.81395349
0.93023256 0.90909091 0.81818182 0.84090909]
mean value: 0.8605179704016913
key: train_recall
value: [0.91348601 0.90585242 0.90839695 0.91857506 0.91878173 0.90862944
0.91370558 0.92366412 0.93384224 0.91348601]
mean value: 0.9158419550251223
key: test_roc_auc
value: [0.84090909 0.70454545 0.77272727 0.79545455 0.8282241 0.77061311
0.84011628 0.84989429 0.78118393 0.74603594]
mean value: 0.7929704016913319
key: train_roc_auc
value: [0.86132316 0.85877863 0.86768448 0.8778626 0.87287687 0.85380581
0.86143294 0.87553764 0.87555056 0.87044859]
mean value: 0.867530127484791
key: test_jcc
value: [0.74074074 0.58730159 0.63636364 0.67857143 0.71698113 0.63636364
0.74074074 0.75471698 0.65454545 0.62711864]
mean value: 0.6773443981902568
key: train_jcc
value: [0.76709402 0.76231263 0.77440347 0.78993435 0.78354978 0.75687104
0.76759062 0.78741866 0.78924731 0.77874187]
mean value: 0.7757163746391412
MCC on Blind test: 0.47
Accuracy on Blind test: 0.74