LSHTM_analysis/scripts/ml/log_rpob_cd_8020.txt
2022-06-20 21:55:47 +01:00

19838 lines
983 KiB
Text

/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_8020.py:548: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
1.22.4
1.4.1
aaindex_df contains non-numerical data
Total no. of non-numerial columns: 2
Selecting numerical data only
PASS: successfully selected numerical columns only for aaindex_df
Now checking for NA in the remaining aaindex_cols
Counting aaindex_df cols with NA
ncols with NA: 4 columns
Dropping these...
Original ncols: 127
Revised df ncols: 123
Checking NA in revised df...
PASS: cols with NA successfully dropped from aaindex_df
Proceeding with combining aa_df with other features_df
PASS: ncols match
Expected ncols: 123
Got: 123
Total no. of columns in clean aa_df: 123
Proceeding to merge, expected nrows in merged_df: 1133
PASS: my_features_df and aa_df successfully combined
nrows: 1133
ncols: 274
count of NULL values before imputation
or_mychisq 339
log10_or_mychisq 339
dtype: int64
count of NULL values AFTER imputation
mutationinformation 0
or_rawI 0
logorI 0
dtype: int64
PASS: OR values imputed, data ready for ML
Total no. of features for aaindex: 123
No. of numerical features: 169
No. of categorical features: 7
PASS: x_features has no target variable
No. of columns for x_features: 176
-------------------------------------------------------------
Successfully split data with stratification [COMPLETE data]: 80/20
Original data size: (1132, 176)
Train data size: (905, 176)
Test data size: (227, 176)
y_train numbers: Counter({0: 661, 1: 244})
y_train ratio: 2.709016393442623
y_test_numbers: Counter({0: 166, 1: 61})
y_test ratio: 2.721311475409836
-------------------------------------------------------------
index: 0
ind: 1
Mask count check: True
index: 1
ind: 2
Mask count check: True
index: 2
ind: 3
Mask count check: True
Original Data
Counter({0: 661, 1: 244}) Data dim: (905, 176)
Simple Random OverSampling
Counter({0: 661, 1: 661})
(1322, 176)
Simple Random UnderSampling
Counter({0: 244, 1: 244})
(488, 176)
Simple Combined Over and UnderSampling
Counter({0: 661, 1: 661})
(1322, 176)
SMOTE_NC OverSampling
Counter({0: 661, 1: 661})
(1322, 176)
#####################################################################
Running ML analysis [COMPLETE DATA]: 80/20 split
Gene name: rpoB
Drug name: rifampicin
Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_cd_8020/
Sanity checks:
Total input features: 176
Training data size: (905, 176)
Test data size: (227, 176)
Target feature numbers (training data): Counter({0: 661, 1: 244})
Target features ratio (training data: 2.709016393442623
Target feature numbers (test data): Counter({0: 166, 1: 61})
Target features ratio (test data): 2.721311475409836
#####################################################################
================================================================
Strucutral features (n): 37
These are:
Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist']
FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss']
Other struc columns: ['rsa', 'kd_values', 'rd_values']
================================================================
AAindex features (n): 123
These are:
['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106']
================================================================
Evolutionary features (n): 3
These are:
['consurf_score', 'snap2_score', 'provean_score']
================================================================
Genomic features (n): 6
These are:
['maf', 'logorI']
['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique']
================================================================
Categorical features (n): 7
These are:
['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site']
================================================================
Pass: No. of features match
#####################################################################
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.04789352 0.06307125 0.0481348 0.10236144 0.18239737 0.0426147
0.10045457 0.15550613 0.0969727 0.05134344]
mean value: 0.08907499313354492
key: score_time
value: [0.01345658 0.01279902 0.01375747 0.01285005 0.01630592 0.02184248
0.01292872 0.01723862 0.01306796 0.02284384]
mean value: 0.015709066390991212
key: test_mcc
value: [0.63725369 0.57246685 0.55878788 0.54842288 0.51496742 0.54545455
0.51508188 0.65091508 0.70352647 0.59215653]
mean value: 0.5839033214050867
key: train_mcc
value: [0.71248167 0.70245135 0.68901123 0.72431999 0.703704 0.71767136
0.72242976 0.69790312 0.69642243 0.69924227]
mean value: 0.7065637192314627
key: test_accuracy
value: [0.85714286 0.83516484 0.82417582 0.82417582 0.81318681 0.82222222
0.82222222 0.86666667 0.88888889 0.83333333]
mean value: 0.8387179487179487
key: train_accuracy
value: [0.88820639 0.88452088 0.87960688 0.89312039 0.88329238 0.89079755
0.89202454 0.88220859 0.88220859 0.88343558]
mean value: 0.8859421775372696
key: test_fscore
value: [0.73469388 0.68085106 0.68 0.66666667 0.63829787 0.66666667
0.61904762 0.73913043 0.76190476 0.70588235]
mean value: 0.6893141315730733
key: train_fscore
value: [0.78787879 0.78037383 0.76995305 0.79625293 0.78359909 0.79058824
0.79534884 0.77777778 0.77570093 0.77751756]
mean value: 0.7834991036799865
key: test_precision
value: [0.72 0.72727273 0.68 0.69565217 0.68181818 0.66666667
0.72222222 0.77272727 0.88888889 0.66666667]
mean value: 0.722191480017567
key: train_precision
value: [0.80861244 0.79904306 0.79227053 0.81730769 0.78181818 0.8195122
0.81428571 0.79245283 0.79807692 0.80193237]
mean value: 0.8025311937742211
key: test_recall
value: [0.75 0.64 0.68 0.64 0.6 0.66666667
0.54166667 0.70833333 0.66666667 0.75 ]
mean value: 0.6643333333333333
key: train_recall
value: [0.76818182 0.76255708 0.74885845 0.77625571 0.78538813 0.76363636
0.77727273 0.76363636 0.75454545 0.75454545]
mean value: 0.7654877542548775
key: test_roc_auc
value: [0.82276119 0.77454545 0.77939394 0.7669697 0.7469697 0.77272727
0.73295455 0.81628788 0.81818182 0.80681818]
mean value: 0.7837609678878336
key: train_roc_auc
value: [0.85042088 0.84598442 0.83829477 0.85619508 0.85235793 0.85072574
0.85586325 0.84484339 0.84197861 0.84281895]
mean value: 0.8479483023318639
key: test_jcc
value: [0.58064516 0.51612903 0.51515152 0.5 0.46875 0.5
0.44827586 0.5862069 0.61538462 0.54545455]
mean value: 0.5275997628159753
key: train_jcc
value: [0.65 0.63984674 0.6259542 0.6614786 0.64419476 0.6536965
0.66023166 0.63636364 0.63358779 0.63601533]
mean value: 0.644136920412421
MCC on Blind test: 0.61
Accuracy on Blind test: 0.85
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [2.18454385 3.19450092 3.09683037 1.91522408 1.2263341 1.21541953
1.31338263 1.00562215 1.47615552 1.02071047]
mean value: 1.7648723602294922
key: score_time
value: [0.02524209 0.05973387 0.05177426 0.01649189 0.02263284 0.01362753
0.01753664 0.0148828 0.02008581 0.01392865]
mean value: 0.02559363842010498
key: test_mcc
value: [0.62759962 0.6529561 0.67809672 0.54842288 0.44620513 0.51076836
0.45226702 0.67722905 0.70509509 0.61348661]
mean value: 0.5912126585754784
key: train_mcc
value: [0.83425771 0.80351061 0.82034758 0.82665973 0.77386796 0.81174107
0.76434069 0.73636649 0.78610731 0.75931692]
mean value: 0.7916516091901614
key: test_accuracy
value: [0.84615385 0.86813187 0.86813187 0.82417582 0.79120879 0.81111111
0.8 0.87777778 0.88888889 0.84444444]
mean value: 0.842002442002442
key: train_accuracy
value: [0.93488943 0.92260442 0.92997543 0.93243243 0.91154791 0.92638037
0.90797546 0.89693252 0.91656442 0.90674847]
mean value: 0.9186050858443496
key: test_fscore
value: [0.73076923 0.72727273 0.76923077 0.66666667 0.57777778 0.63829787
0.57142857 0.75555556 0.77272727 0.72 ]
mean value: 0.6929726443768996
key: train_fscore
value: [0.87871854 0.85649203 0.86774942 0.87238979 0.83410138 0.86175115
0.82678984 0.80645161 0.84259259 0.82159624]
mean value: 0.8468632596467518
key: test_precision
value: [0.67857143 0.84210526 0.74074074 0.69565217 0.65 0.65217391
0.66666667 0.80952381 0.85 0.69230769]
mean value: 0.7277741687924755
key: train_precision
value: [0.88479263 0.85454545 0.88207547 0.88679245 0.84186047 0.87383178
0.84037559 0.81775701 0.85849057 0.84951456]
mean value: 0.8590035971963867
key: test_recall
value: [0.79166667 0.64 0.8 0.64 0.52 0.625
0.5 0.70833333 0.70833333 0.75 ]
mean value: 0.6683333333333333
key: train_recall
value: [0.87272727 0.85844749 0.85388128 0.85844749 0.82648402 0.85
0.81363636 0.79545455 0.82727273 0.79545455]
mean value: 0.8351805728518057
key: test_roc_auc
value: [0.82866915 0.79727273 0.8469697 0.7669697 0.7069697 0.75189394
0.70454545 0.82386364 0.83143939 0.81439394]
mean value: 0.7872987336047038
key: train_roc_auc
value: [0.91531987 0.90233299 0.90593224 0.90905568 0.88467058 0.90231092
0.87824675 0.86495416 0.88842628 0.87167685]
mean value: 0.8922926320106014
key: test_jcc
value: [0.57575758 0.57142857 0.625 0.5 0.40625 0.46875
0.4 0.60714286 0.62962963 0.5625 ]
mean value: 0.5346458633958634
key: train_jcc
value: [0.78367347 0.74900398 0.76639344 0.77366255 0.71541502 0.75708502
0.70472441 0.67567568 0.728 0.69721116]
mean value: 0.7350844728023521
MCC on Blind test: 0.61
Accuracy on Blind test: 0.85
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0187583 0.01387739 0.01324153 0.01288629 0.01322436 0.0131669
0.01319027 0.01312518 0.01313162 0.01309538]
mean value: 0.013769721984863282
key: score_time
value: [0.01342607 0.01070094 0.0104444 0.0103941 0.01025105 0.01064491
0.01033258 0.0104003 0.01012063 0.01041937]
mean value: 0.010713434219360352
key: test_mcc
value: [0.46431857 0.34257106 0.3547423 0.44066378 0.51102889 0.44718
0.46296162 0.44441468 0.4674735 0.46313625]
mean value: 0.43984906387328504
key: train_mcc
value: [0.49764726 0.46577226 0.48255617 0.46105197 0.4887123 0.48766662
0.46709067 0.46459558 0.51438731 0.45890879]
mean value: 0.4788388923833725
key: test_accuracy
value: [0.76923077 0.69230769 0.73626374 0.74725275 0.79120879 0.77777778
0.75555556 0.75555556 0.78888889 0.77777778]
mean value: 0.7591819291819292
key: train_accuracy
value: [0.79115479 0.74078624 0.77886978 0.77027027 0.78869779 0.78650307
0.77546012 0.77177914 0.79509202 0.77055215]
mean value: 0.7769165372846354
key: test_fscore
value: [0.61818182 0.5483871 0.53846154 0.61016949 0.65454545 0.6
0.62068966 0.60714286 0.6122449 0.61538462]
mean value: 0.6025207425147499
key: train_fscore
value: [0.64135021 0.62388592 0.63265306 0.61758691 0.63404255 0.63445378
0.62111801 0.62040816 0.65424431 0.61601643]
mean value: 0.6295759346178662
key: test_precision
value: [0.5483871 0.45945946 0.51851852 0.52941176 0.6 0.57692308
0.52941176 0.53125 0.6 0.57142857]
mean value: 0.5464790252515584
key: train_precision
value: [0.5984252 0.51169591 0.57195572 0.55925926 0.5936255 0.58984375
0.57034221 0.56296296 0.60076046 0.56179775]
mean value: 0.5720668707476475
key: test_recall
value: [0.70833333 0.68 0.56 0.72 0.72 0.625
0.75 0.70833333 0.625 0.66666667]
mean value: 0.6763333333333333
key: train_recall
value: [0.69090909 0.79908676 0.70776256 0.68949772 0.6803653 0.68636364
0.68181818 0.69090909 0.71818182 0.68181818]
mean value: 0.7026712328767123
key: test_roc_auc
value: [0.74968905 0.68848485 0.68151515 0.73878788 0.76909091 0.72916667
0.75378788 0.7405303 0.73674242 0.74242424]
mean value: 0.7330219357756671
key: train_roc_auc
value: [0.75959596 0.75920724 0.75640229 0.74474886 0.75446836 0.75494652
0.74595111 0.74629488 0.77085561 0.74258976]
mean value: 0.753506060373506
key: test_jcc
value: [0.44736842 0.37777778 0.36842105 0.43902439 0.48648649 0.42857143
0.45 0.43589744 0.44117647 0.44444444]
mean value: 0.43191679076939216
key: train_jcc
value: [0.47204969 0.45336788 0.46268657 0.44674556 0.46417445 0.46461538
0.45045045 0.44970414 0.48615385 0.44510386]
mean value: 0.459505183000996
MCC on Blind test: 0.43
Accuracy on Blind test: 0.77
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01282287 0.01630855 0.01634121 0.01639724 0.01683545 0.01645255
0.01656127 0.01641417 0.01643634 0.01640034]
mean value: 0.016096997261047363
key: score_time
value: [0.01217628 0.01253057 0.01262355 0.01262665 0.01264167 0.01256514
0.01254058 0.01260877 0.01280284 0.01267886]
mean value: 0.01257948875427246
key: test_mcc
value: [0.52551942 0.60507041 0.45746799 0.30563727 0.39333333 0.30012252
0.40291148 0.64071161 0.4755188 0.44718 ]
mean value: 0.45534728443319816
key: train_mcc
value: [0.53038838 0.50974707 0.52491489 0.53390093 0.54820336 0.54668176
0.49997774 0.50534121 0.55293504 0.52295749]
mean value: 0.5275047871297697
key: test_accuracy
value: [0.81318681 0.84615385 0.79120879 0.73626374 0.75824176 0.73333333
0.77777778 0.86666667 0.8 0.77777778]
mean value: 0.7900610500610501
key: train_accuracy
value: [0.81449631 0.81326781 0.81695332 0.82063882 0.82678133 0.82208589
0.80981595 0.80613497 0.82453988 0.81717791]
mean value: 0.8171892193364586
key: test_fscore
value: [0.65306122 0.70833333 0.59574468 0.47826087 0.56 0.47826087
0.54545455 0.71428571 0.60869565 0.6 ]
mean value: 0.5942096889718801
key: train_fscore
value: [0.65759637 0.63285024 0.64775414 0.65402844 0.66348449 0.66819222
0.62469734 0.63761468 0.67276888 0.64439141]
mean value: 0.6503378195409838
key: test_precision
value: [0.64 0.73913043 0.63636364 0.52380952 0.56 0.5
0.6 0.83333333 0.63636364 0.57692308]
mean value: 0.6245923641575816
key: train_precision
value: [0.6561086 0.67179487 0.67156863 0.67980296 0.695 0.67281106
0.66839378 0.64351852 0.67741935 0.67839196]
mean value: 0.6714809727643422
key: test_recall
value: [0.66666667 0.68 0.56 0.44 0.56 0.45833333
0.5 0.625 0.58333333 0.625 ]
mean value: 0.5698333333333333
key: train_recall
value: [0.65909091 0.59817352 0.62557078 0.63013699 0.6347032 0.66363636
0.58636364 0.63181818 0.66818182 0.61363636]
mean value: 0.6311311747613118
key: test_roc_auc
value: [0.76616915 0.79454545 0.71939394 0.64424242 0.69666667 0.64583333
0.68939394 0.78977273 0.73106061 0.72916667]
mean value: 0.7206244911804613
key: train_roc_auc
value: [0.76557239 0.74530525 0.75648287 0.76044664 0.76609109 0.77215432
0.73940031 0.75120321 0.77526738 0.75303667]
mean value: 0.7584960120757864
key: test_jcc
value: [0.48484848 0.5483871 0.42424242 0.31428571 0.38888889 0.31428571
0.375 0.55555556 0.4375 0.42857143]
mean value: 0.4271565307452404
key: train_jcc
value: [0.48986486 0.46289753 0.47902098 0.48591549 0.49642857 0.50171821
0.45422535 0.46801347 0.50689655 0.47535211]
mean value: 0.4820333132358686
MCC on Blind test: 0.48
Accuracy on Blind test: 0.81
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01524544 0.01264095 0.01103282 0.01104927 0.01092625 0.01289582
0.01252794 0.01093578 0.01066732 0.01054311]
mean value: 0.011846470832824706
key: score_time
value: [0.10600185 0.01958251 0.02010822 0.0199244 0.01993418 0.02180696
0.0200088 0.02038217 0.02008438 0.01971579]
mean value: 0.028754925727844237
key: test_mcc
value: [0.4359729 0.2248982 0.25060326 0.15668154 0.32373419 0.24534987
0.40451992 0.4792982 0.35478744 0.35478744]
mean value: 0.32306329498533326
key: train_mcc
value: [0.55031 0.57866175 0.58609779 0.59710057 0.59745376 0.59928007
0.57365543 0.57455254 0.57119357 0.59242769]
mean value: 0.5820733171911701
key: test_accuracy
value: [0.8021978 0.73626374 0.72527473 0.72527473 0.75824176 0.73333333
0.78888889 0.81111111 0.77777778 0.77777778]
mean value: 0.7636141636141636
key: train_accuracy
value: [0.83415233 0.84520885 0.84766585 0.85135135 0.85135135 0.85153374
0.84294479 0.84294479 0.84171779 0.84907975]
mean value: 0.8457950588625435
key: test_fscore
value: [0.52631579 0.33333333 0.41860465 0.24242424 0.45 0.4
0.51282051 0.58536585 0.44444444 0.44444444]
mean value: 0.4357753271761989
key: train_fscore
value: [0.64 0.64804469 0.65555556 0.66481994 0.67029973 0.67385445
0.65027322 0.65591398 0.6541555 0.67024129]
mean value: 0.6583158353231275
key: test_precision
value: [0.71428571 0.54545455 0.5 0.5 0.6 0.5
0.66666667 0.70588235 0.66666667 0.66666667]
mean value: 0.6065622612681436
key: train_precision
value: [0.77419355 0.83453237 0.83687943 0.84507042 0.83108108 0.82781457
0.81506849 0.80263158 0.79738562 0.81699346]
mean value: 0.8181650585330019
key: test_recall
value: [0.41666667 0.24 0.36 0.16 0.36 0.33333333
0.41666667 0.5 0.33333333 0.33333333]
mean value: 0.3453333333333333
key: train_recall
value: [0.54545455 0.52968037 0.53881279 0.54794521 0.56164384 0.56818182
0.54090909 0.55454545 0.55454545 0.56818182]
mean value: 0.5509900373599004
key: test_roc_auc
value: [0.67848259 0.58212121 0.61181818 0.54969697 0.63454545 0.60606061
0.67045455 0.71212121 0.63636364 0.63636364]
mean value: 0.6318028041610131
key: train_roc_auc
value: [0.74326599 0.74551245 0.75007866 0.75548521 0.75981351 0.76224217
0.74776547 0.75206264 0.75122231 0.7605615 ]
mean value: 0.7528009915741585
key: test_jcc
value: [0.35714286 0.2 0.26470588 0.13793103 0.29032258 0.25
0.34482759 0.4137931 0.28571429 0.28571429]
mean value: 0.2830151615707462
key: train_jcc
value: [0.47058824 0.47933884 0.48760331 0.49792531 0.50409836 0.50813008
0.48178138 0.488 0.48605578 0.50403226]
mean value: 0.49075535486894833
MCC on Blind test: 0.38
Accuracy on Blind test: 0.78
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.04472566 0.03670692 0.03695655 0.03710032 0.03720188 0.03994393
0.04123163 0.04074073 0.03910685 0.03895974]
mean value: 0.03926742076873779
key: score_time
value: [0.01724148 0.01627874 0.0164144 0.01637268 0.01719189 0.01736212
0.01677203 0.01638651 0.01843786 0.01628327]
mean value: 0.016874098777770997
key: test_mcc
value: [0.56001732 0.51496742 0.63725369 0.50565559 0.4351278 0.38019877
0.55001241 0.59244966 0.60677988 0.54545455]
mean value: 0.532791707378351
key: train_mcc
value: [0.66223433 0.67434671 0.6577144 0.68087197 0.68545586 0.66431904
0.66727313 0.65537963 0.66901689 0.67744119]
mean value: 0.6694053147134513
key: test_accuracy
value: [0.83516484 0.81318681 0.85714286 0.81318681 0.78021978 0.76666667
0.83333333 0.84444444 0.85555556 0.82222222]
mean value: 0.8221123321123321
key: train_accuracy
value: [0.86977887 0.87592138 0.86977887 0.87837838 0.87837838 0.87239264
0.86993865 0.86871166 0.87361963 0.87730061]
mean value: 0.8734199062419921
key: test_fscore
value: [0.66666667 0.63829787 0.73469388 0.62222222 0.58333333 0.53333333
0.65116279 0.69565217 0.66666667 0.66666667]
mean value: 0.6458695603391053
key: train_fscore
value: [0.74881517 0.75425791 0.74146341 0.75912409 0.76705882 0.74509804
0.75576037 0.73965937 0.75060533 0.75490196]
mean value: 0.7516744462110857
key: test_precision
value: [0.71428571 0.68181818 0.75 0.7 0.60869565 0.57142857
0.73684211 0.72727273 0.86666667 0.66666667]
mean value: 0.7023676285575599
key: train_precision
value: [0.78217822 0.80729167 0.79581152 0.8125 0.79126214 0.80851064
0.76635514 0.79581152 0.80310881 0.81914894]
mean value: 0.798197858000515
key: test_recall
value: [0.625 0.6 0.72 0.56 0.56 0.5
0.58333333 0.66666667 0.54166667 0.66666667]
mean value: 0.6023333333333334
key: train_recall
value: [0.71818182 0.70776256 0.69406393 0.71232877 0.74429224 0.69090909
0.74545455 0.69090909 0.70454545 0.7 ]
mean value: 0.7108447488584475
key: test_roc_auc
value: [0.76772388 0.7469697 0.81454545 0.73454545 0.71181818 0.68181818
0.75378788 0.78787879 0.75568182 0.77272727]
mean value: 0.7527496607869741
key: train_roc_auc
value: [0.82205387 0.82278884 0.81425885 0.82591228 0.83601166 0.81520244
0.83071047 0.81268144 0.82033995 0.82142857]
mean value: 0.822138838792747
key: test_jcc
value: [0.5 0.46875 0.58064516 0.4516129 0.41176471 0.36363636
0.48275862 0.53333333 0.5 0.5 ]
mean value: 0.4792501088057834
key: train_jcc
value: [0.59848485 0.60546875 0.58914729 0.61176471 0.6221374 0.59375
0.60740741 0.58687259 0.60077519 0.60629921]
mean value: 0.6022107396445928
MCC on Blind test: 0.49
Accuracy on Blind test: 0.81
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [1.22346926 5.16487861 5.78156376 3.83858371 1.55056047 1.29423809
0.99304223 2.79679513 3.26674962 1.57767797]
mean value: 2.7487558841705324
key: score_time
value: [0.013201 0.02823043 0.02088594 0.02056909 0.02059984 0.0205822
0.02068686 0.02088261 0.02082276 0.02058458]
mean value: 0.0207045316696167
key: test_mcc
value: [0.64850123 0.49781081 0.62455652 0.57031192 0.57031192 0.44250602
0.50822474 0.57789674 0.73663511 0.63960215]
mean value: 0.5816357147239263
key: train_mcc
value: [0.66764601 0.66977539 0.74152285 0.80197555 0.70737076 0.53141196
0.58027983 0.70480057 0.74133067 0.67564324]
mean value: 0.6821756844169496
key: test_accuracy
value: [0.85714286 0.81318681 0.85714286 0.82417582 0.82417582 0.8
0.82222222 0.84444444 0.9 0.84444444]
mean value: 0.8386935286935286
key: train_accuracy
value: [0.86732187 0.87592138 0.9017199 0.91891892 0.88083538 0.82944785
0.84294479 0.88834356 0.89079755 0.86503067]
mean value: 0.8761281861895359
key: test_fscore
value: [0.74509804 0.60465116 0.71111111 0.69230769 0.69230769 0.55
0.6 0.66666667 0.8 0.74074074]
mean value: 0.6802883105140287
key: train_fscore
value: [0.75892857 0.72176309 0.79166667 0.85652174 0.78867102 0.6017192
0.6751269 0.76606684 0.81341719 0.76694915]
mean value: 0.7540830369215625
key: test_precision
value: [0.7037037 0.72222222 0.8 0.66666667 0.66666667 0.6875
0.75 0.77777778 0.85714286 0.66666667]
mean value: 0.729834656084656
key: train_precision
value: [0.74561404 0.90972222 0.92121212 0.81742739 0.75416667 0.81395349
0.76436782 0.8816568 0.75486381 0.71825397]
mean value: 0.8081238321762161
key: test_recall
value: [0.79166667 0.52 0.64 0.72 0.72 0.45833333
0.5 0.58333333 0.75 0.83333333]
mean value: 0.6516666666666666
key: train_recall
value: [0.77272727 0.59817352 0.69406393 0.89954338 0.82648402 0.47727273
0.60454545 0.67727273 0.88181818 0.82272727]
mean value: 0.7254628476546284
key: test_roc_auc
value: [0.83613184 0.72212121 0.78969697 0.79181818 0.79181818 0.69128788
0.71969697 0.76136364 0.85227273 0.84090909]
mean value: 0.7797116689280869
key: train_roc_auc
value: [0.83754209 0.78816239 0.83610759 0.9127969 0.86366218 0.7184683
0.76781895 0.82182964 0.88796791 0.85169977]
mean value: 0.8286055714661678
key: test_jcc
value: [0.59375 0.43333333 0.55172414 0.52941176 0.52941176 0.37931034
0.42857143 0.5 0.66666667 0.58823529]
mean value: 0.5200414734859461
key: train_jcc
value: [0.61151079 0.56465517 0.65517241 0.74904943 0.65107914 0.43032787
0.50957854 0.62083333 0.68551237 0.62199313]
mean value: 0.6099712184808272
MCC on Blind test: 0.58
Accuracy on Blind test: 0.84
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.06455588 0.04757524 0.05126166 0.04958081 0.07174301 0.09911942
0.0670929 0.05137992 0.04852629 0.05046058]
mean value: 0.06012957096099854
key: score_time
value: [0.01296258 0.01292348 0.0129292 0.01287627 0.02691889 0.02911472
0.01300859 0.03288078 0.01305461 0.01302671]
mean value: 0.017969584465026854
key: test_mcc
value: [0.73758379 0.5916592 0.67809672 0.76472717 0.71465185 0.61348661
0.67187336 0.73450514 0.67722905 0.69290233]
mean value: 0.6876715224801788
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9010989 0.83516484 0.86813187 0.89010989 0.89010989 0.84444444
0.87777778 0.88888889 0.87777778 0.86666667]
mean value: 0.874017094017094
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8 0.70588235 0.76923077 0.82758621 0.7826087 0.72
0.73170732 0.80769231 0.75555556 0.77777778]
mean value: 0.7678040982819483
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.69230769 0.74074074 0.72727273 0.85714286 0.69230769
0.88235294 0.75 0.80952381 0.7 ]
mean value: 0.7708791317614847
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.72 0.8 0.96 0.72 0.75
0.625 0.875 0.70833333 0.875 ]
mean value: 0.7783333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85261194 0.79939394 0.8469697 0.91181818 0.83727273 0.81439394
0.79734848 0.8844697 0.82386364 0.86931818]
mean value: 0.8437460425146992
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.66666667 0.54545455 0.625 0.70588235 0.64285714 0.5625
0.57692308 0.67741935 0.60714286 0.63636364]
mean value: 0.6246209633187811
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.7
Accuracy on Blind test: 0.88
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.34010911 0.24522424 0.31502557 0.2490201 0.29990029 0.24824834
0.33211732 0.20768809 0.17433023 0.21552944]
mean value: 0.2627192735671997
key: score_time
value: [0.02708316 0.0260675 0.02612591 0.02618861 0.02614689 0.02590108
0.02695704 0.0190289 0.02099538 0.02617288]
mean value: 0.025066733360290527
key: test_mcc
value: [0.59397623 0.54842288 0.48092924 0.53935989 0.59779054 0.49901088
0.51508188 0.67314951 0.67419986 0.52378493]
mean value: 0.5645705839858854
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84615385 0.82417582 0.8021978 0.82417582 0.84615385 0.81111111
0.82222222 0.87777778 0.87777778 0.82222222]
mean value: 0.8353968253968254
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.69565217 0.66666667 0.60869565 0.65217391 0.69565217 0.62222222
0.61904762 0.74418605 0.71794872 0.63636364]
mean value: 0.6658608821803969
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.72727273 0.69565217 0.66666667 0.71428571 0.76190476 0.66666667
0.72222222 0.84210526 0.93333333 0.7 ]
mean value: 0.743010952942303
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.66666667 0.64 0.56 0.6 0.64 0.58333333
0.54166667 0.66666667 0.58333333 0.58333333]
mean value: 0.6065
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.78855721 0.7669697 0.7269697 0.75454545 0.78212121 0.73863636
0.73295455 0.81060606 0.78409091 0.74621212]
mean value: 0.7631663274536409
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.53333333 0.5 0.4375 0.48387097 0.53333333 0.4516129
0.44827586 0.59259259 0.56 0.46666667]
mean value: 0.5007185658962633
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.51
Accuracy on Blind test: 0.82
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01538968 0.0140624 0.01389027 0.01316977 0.01276588 0.01181865
0.01234031 0.01337767 0.01329684 0.01371527]
mean value: 0.013382673263549805
key: score_time
value: [0.01168752 0.01035738 0.01049376 0.0119102 0.01032829 0.00932217
0.01101422 0.00986791 0.00957704 0.01047921]
mean value: 0.010503768920898438
key: test_mcc
value: [0.30340909 0.38675467 0.44066378 0.50565559 0.2830303 0.11903254
0.26967994 0.51741002 0.26382243 0.44718 ]
mean value: 0.3536638368258284
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.71428571 0.76923077 0.74725275 0.81318681 0.71428571 0.67777778
0.74444444 0.8 0.73333333 0.77777778]
mean value: 0.7491575091575091
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.5 0.53333333 0.61016949 0.62222222 0.48 0.3255814
0.41025641 0.65384615 0.42857143 0.6 ]
mean value: 0.5163980435103809
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.46428571 0.6 0.52941176 0.7 0.48 0.36842105
0.53333333 0.60714286 0.5 0.57692308]
mean value: 0.5359517799022443
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.54166667 0.48 0.72 0.56 0.48 0.29166667
0.33333333 0.70833333 0.375 0.625 ]
mean value: 0.5115
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.65889303 0.67939394 0.73878788 0.73454545 0.64151515 0.55492424
0.61363636 0.77083333 0.61931818 0.72916667]
mean value: 0.6741014246947082
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.33333333 0.36363636 0.43902439 0.4516129 0.31578947 0.19444444
0.25806452 0.48571429 0.27272727 0.42857143]
mean value: 0.354291841171008
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.33
Accuracy on Blind test: 0.75
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [2.64140868 3.81011534 3.99258542 4.03306317 4.11460781 4.15739083
4.4136889 3.99358988 4.03670621 3.72725749]
mean value: 3.8920413732528685
key: score_time
value: [0.10188866 0.2263751 0.13317132 0.13334799 0.1331985 0.13286519
0.13206124 0.13259816 0.13317728 0.10428119]
mean value: 0.13629646301269532
key: test_mcc
value: [0.74218994 0.62420432 0.83151316 0.62455652 0.80485509 0.57966713
0.61158096 0.77272727 0.79628662 0.71590909]
mean value: 0.7103490118151145
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.9010989 0.84615385 0.93406593 0.85714286 0.92307692 0.83333333
0.85555556 0.91111111 0.92222222 0.88888889]
mean value: 0.8872649572649572
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.80851064 0.73076923 0.86956522 0.71111111 0.85714286 0.69387755
0.69767442 0.83333333 0.8372093 0.79166667]
mean value: 0.7830860326663017
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.82608696 0.7037037 0.95238095 0.8 0.875 0.68
0.78947368 0.83333333 0.94736842 0.79166667]
mean value: 0.8199013717869553
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.79166667 0.76 0.8 0.64 0.84 0.70833333
0.625 0.83333333 0.75 0.79166667]
mean value: 0.754
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.86598259 0.81939394 0.89242424 0.78969697 0.89727273 0.79356061
0.78219697 0.88636364 0.86742424 0.85795455]
mean value: 0.8452270465852556
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.67857143 0.57575758 0.76923077 0.55172414 0.75 0.53125
0.53571429 0.71428571 0.72 0.65517241]
mean value: 0.6481706325283911
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
key: fit_time
value: [2.11717629 1.22336912 1.25936317 1.12006736 1.14099693 1.24300289
1.19389844 1.1325438 1.14318395 1.1529448 ]
mean value: 1.2726546764373778
key: score_time
value: [0.22170925 0.1627214 0.2340374 0.13864779 0.16498137 0.19844413
0.16833639 0.17471385 0.2893374 0.24061823]
mean value: 0.19935472011566163
key: test_mcc
value: [0.74218994 0.69312083 0.83151316 0.62455652 0.71465185 0.56837381
0.61158096 0.79879562 0.73606509 0.61782299]
mean value: 0.6938670782096882
key: train_mcc
value: [0.9214512 0.94658694 0.92441313 0.93079515 0.9275678 0.92467251
0.93417988 0.93414825 0.92467251 0.92788451]
mean value: 0.9296371873548671
key: test_accuracy
value: [0.9010989 0.87912088 0.93406593 0.85714286 0.89010989 0.83333333
0.85555556 0.92222222 0.9 0.85555556]
mean value: 0.8828205128205128
key: train_accuracy
value: [0.96928747 0.97911548 0.97051597 0.97297297 0.97174447 0.97055215
0.97423313 0.97423313 0.97055215 0.97177914]
mean value: 0.9724986056887898
key: test_fscore
value: [0.80851064 0.7755102 0.86956522 0.71111111 0.7826087 0.68085106
0.69767442 0.85106383 0.7804878 0.71111111]
mean value: 0.7668494094744926
key: train_fscore
value: [0.94172494 0.96037296 0.94339623 0.94811321 0.94613583 0.94366197
0.95081967 0.9512761 0.94366197 0.94588235]
mean value: 0.9475045238264362
key: test_precision
value: [0.82608696 0.79166667 0.95238095 0.8 0.85714286 0.69565217
0.78947368 0.86956522 0.94117647 0.76190476]
mean value: 0.8285049740720086
key: train_precision
value: [0.96650718 0.98095238 0.97560976 0.9804878 0.97115385 0.97572816
0.98067633 0.97156398 0.97572816 0.9804878 ]
mean value: 0.975889539021806
key: test_recall
value: [0.79166667 0.76 0.8 0.64 0.72 0.66666667
0.625 0.83333333 0.66666667 0.66666667]
mean value: 0.717
key: train_recall
value: [0.91818182 0.94063927 0.91324201 0.91780822 0.92237443 0.91363636
0.92272727 0.93181818 0.91363636 0.91363636]
mean value: 0.9207700290577003
key: test_roc_auc
value: [0.86598259 0.84212121 0.89242424 0.78969697 0.83727273 0.78030303
0.78219697 0.89393939 0.82575758 0.79545455]
mean value: 0.8305149253731343
key: train_roc_auc
value: [0.95319865 0.96695829 0.95241932 0.95554277 0.9561452 0.9526165
0.95800229 0.96086707 0.9526165 0.95345684]
mean value: 0.9561823435614732
key: test_jcc
value: [0.67857143 0.63333333 0.76923077 0.55172414 0.64285714 0.51612903
0.53571429 0.74074074 0.64 0.55172414]
mean value: 0.6260025008567834
key: train_jcc
value: [0.88986784 0.92376682 0.89285714 0.90134529 0.89777778 0.89333333
0.90625 0.90707965 0.89333333 0.89732143]
mean value: 0.9002932610923725
MCC on Blind test: 0.69
Accuracy on Blind test: 0.88
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.02068615 0.04053044 0.04051805 0.04072928 0.04056287 0.01637459
0.01640034 0.01641941 0.01644111 0.02096176]
mean value: 0.02696239948272705
key: score_time
value: [0.01229119 0.02326393 0.02206826 0.02382994 0.01262593 0.01253796
0.01262951 0.01260781 0.01265311 0.01269269]
mean value: 0.015720033645629884
key: test_mcc
value: [0.52551942 0.60507041 0.45746799 0.30563727 0.39333333 0.30012252
0.40291148 0.64071161 0.4755188 0.44718 ]
mean value: 0.45534728443319816
key: train_mcc
value: [0.53038838 0.50974707 0.52491489 0.53390093 0.54820336 0.54668176
0.49997774 0.50534121 0.55293504 0.52295749]
mean value: 0.5275047871297697
key: test_accuracy
value: [0.81318681 0.84615385 0.79120879 0.73626374 0.75824176 0.73333333
0.77777778 0.86666667 0.8 0.77777778]
mean value: 0.7900610500610501
key: train_accuracy
value: [0.81449631 0.81326781 0.81695332 0.82063882 0.82678133 0.82208589
0.80981595 0.80613497 0.82453988 0.81717791]
mean value: 0.8171892193364586
key: test_fscore
value: [0.65306122 0.70833333 0.59574468 0.47826087 0.56 0.47826087
0.54545455 0.71428571 0.60869565 0.6 ]
mean value: 0.5942096889718801
key: train_fscore
value: [0.65759637 0.63285024 0.64775414 0.65402844 0.66348449 0.66819222
0.62469734 0.63761468 0.67276888 0.64439141]
mean value: 0.6503378195409838
key: test_precision
value: [0.64 0.73913043 0.63636364 0.52380952 0.56 0.5
0.6 0.83333333 0.63636364 0.57692308]
mean value: 0.6245923641575816
key: train_precision
value: [0.6561086 0.67179487 0.67156863 0.67980296 0.695 0.67281106
0.66839378 0.64351852 0.67741935 0.67839196]
mean value: 0.6714809727643422
key: test_recall
value: [0.66666667 0.68 0.56 0.44 0.56 0.45833333
0.5 0.625 0.58333333 0.625 ]
mean value: 0.5698333333333333
key: train_recall
value: [0.65909091 0.59817352 0.62557078 0.63013699 0.6347032 0.66363636
0.58636364 0.63181818 0.66818182 0.61363636]
mean value: 0.6311311747613118
key: test_roc_auc
value: [0.76616915 0.79454545 0.71939394 0.64424242 0.69666667 0.64583333
0.68939394 0.78977273 0.73106061 0.72916667]
mean value: 0.7206244911804613
key: train_roc_auc
value: [0.76557239 0.74530525 0.75648287 0.76044664 0.76609109 0.77215432
0.73940031 0.75120321 0.77526738 0.75303667]
mean value: 0.7584960120757864
key: test_jcc
value: [0.48484848 0.5483871 0.42424242 0.31428571 0.38888889 0.31428571
0.375 0.55555556 0.4375 0.42857143]
mean value: 0.4271565307452404
key: train_jcc
value: [0.48986486 0.46289753 0.47902098 0.48591549 0.49642857 0.50171821
0.45422535 0.46801347 0.50689655 0.47535211]
mean value: 0.4820333132358686
MCC on Blind test: 0.48
Accuracy on Blind test: 0.81
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [2.75444412 2.56730151 2.74305201 2.68936348 2.72433519 2.71856141
2.70131421 2.74700999 2.77551627 2.747895 ]
mean value: 2.7168793201446535
key: score_time
value: [0.01431656 0.0132854 0.01292968 0.0139606 0.0135746 0.01367378
0.01520157 0.01331019 0.01399136 0.01370096]
mean value: 0.013794469833374023
key: test_mcc
value: [0.81227628 0.53716427 0.88969697 0.78588153 0.77501303 0.71328456
0.73471806 0.83522876 0.79604116 0.80405441]
mean value: 0.7683359042790965
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92307692 0.81318681 0.95604396 0.91208791 0.91208791 0.87777778
0.9 0.93333333 0.92222222 0.92222222]
mean value: 0.9072039072039072
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8627451 0.66666667 0.92 0.84615385 0.83333333 0.79245283
0.79069767 0.88 0.84444444 0.85714286]
mean value: 0.8293636750387647
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.81481481 0.65384615 0.92 0.81481481 0.86956522 0.72413793
0.89473684 0.84615385 0.9047619 0.84 ]
mean value: 0.8282831524922585
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.91666667 0.68 0.92 0.88 0.8 0.875
0.70833333 0.91666667 0.79166667 0.875 ]
mean value: 0.8363333333333334
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9210199 0.77181818 0.94484848 0.90212121 0.87727273 0.87689394
0.83901515 0.9280303 0.88068182 0.90719697]
mean value: 0.88488986883763
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75862069 0.5 0.85185185 0.73333333 0.71428571 0.65625
0.65384615 0.78571429 0.73076923 0.75 ]
mean value: 0.7134671259455743
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.9
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.06290793 0.09113836 0.09762669 0.08045506 0.1088562 0.09835911
0.10149932 0.09140635 0.09574938 0.07322192]
mean value: 0.09012203216552735
key: score_time
value: [0.03614759 0.01270986 0.02036381 0.02011633 0.02052951 0.02215934
0.02006292 0.01273179 0.02016211 0.01268744]
mean value: 0.019767069816589357
key: test_mcc
value: [0.68163138 0.48266935 0.60314831 0.45746799 0.46252711 0.5716838
0.51076836 0.61158096 0.65091508 0.55805107]
mean value: 0.5590443404174392
key: train_mcc
value: [0.74113148 0.74177387 0.7319774 0.73266902 0.74663937 0.7338776
0.73357097 0.71200359 0.71200359 0.72874952]
mean value: 0.7314396423353031
key: test_accuracy
value: [0.86813187 0.79120879 0.83516484 0.79120879 0.78021978 0.82222222
0.81111111 0.85555556 0.86666667 0.82222222]
mean value: 0.8243711843711844
key: train_accuracy
value: [0.8980344 0.8992629 0.8955774 0.8955774 0.9004914 0.89693252
0.89570552 0.88711656 0.88711656 0.89447853]
mean value: 0.8950293182195023
key: test_fscore
value: [0.76923077 0.62745098 0.71698113 0.59574468 0.61538462 0.69230769
0.63829787 0.69767442 0.73913043 0.68 ]
mean value: 0.6772202595969455
key: train_fscore
value: [0.81093394 0.81018519 0.80278422 0.80369515 0.81464531 0.8028169
0.8045977 0.78899083 0.78899083 0.8 ]
mean value: 0.8027640061671473
key: test_precision
value: [0.71428571 0.61538462 0.67857143 0.63636364 0.59259259 0.64285714
0.65217391 0.78947368 0.77272727 0.65384615]
mean value: 0.6748276153882561
key: train_precision
value: [0.81278539 0.82159624 0.81603774 0.81308411 0.81651376 0.83009709
0.81395349 0.7962963 0.7962963 0.81904762]
mean value: 0.8135708029116734
key: test_recall
value: [0.83333333 0.64 0.76 0.56 0.64 0.75
0.625 0.625 0.70833333 0.70833333]
mean value: 0.685
key: train_recall
value: [0.80909091 0.79908676 0.78995434 0.79452055 0.81278539 0.77727273
0.79545455 0.78181818 0.78181818 0.78181818]
mean value: 0.7923619759236198
key: test_roc_auc
value: [0.85696517 0.74424242 0.81181818 0.71939394 0.73666667 0.79924242
0.75189394 0.78219697 0.81628788 0.78598485]
mean value: 0.7804692446856626
key: train_roc_auc
value: [0.87003367 0.86761061 0.86220406 0.86364683 0.87277925 0.8592246
0.86411383 0.8539343 0.8539343 0.85897632]
mean value: 0.862645775897186
key: test_jcc
value: [0.625 0.45714286 0.55882353 0.42424242 0.44444444 0.52941176
0.46875 0.53571429 0.5862069 0.51515152]
mean value: 0.5144887717364898
key: train_jcc
value: [0.68199234 0.68093385 0.67054264 0.67181467 0.68725869 0.67058824
0.67307692 0.65151515 0.65151515 0.66666667]
mean value: 0.6705904312105113
MCC on Blind test: 0.57
Accuracy on Blind test: 0.84
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02526474 0.01550508 0.02460146 0.01528096 0.03058648 0.01548696
0.01551771 0.0158608 0.01576138 0.01571155]
mean value: 0.01895771026611328
key: score_time
value: [0.01229072 0.01626825 0.01230955 0.01229024 0.02189422 0.01223373
0.01224208 0.01224113 0.01226044 0.01224518]
mean value: 0.01362755298614502
key: test_mcc
value: [0.53931786 0.54842288 0.51496742 0.49177534 0.46965229 0.51076836
0.54545455 0.64465837 0.48863636 0.50261554]
mean value: 0.5256268946384712
key: train_mcc
value: [0.51519725 0.53378947 0.52880604 0.53260773 0.55896121 0.54917868
0.54291619 0.52792195 0.55056103 0.54668176]
mean value: 0.5386621323453594
key: test_accuracy
value: [0.81318681 0.82417582 0.81318681 0.8021978 0.79120879 0.81111111
0.82222222 0.86666667 0.8 0.8 ]
mean value: 0.8143956043956044
key: train_accuracy
value: [0.80958231 0.81818182 0.81572482 0.81695332 0.82678133 0.82331288
0.8208589 0.81472393 0.82208589 0.82208589]
mean value: 0.8190291071886164
key: test_fscore
value: [0.66666667 0.66666667 0.63829787 0.625 0.6122449 0.63829787
0.66666667 0.72727273 0.625 0.64 ]
mean value: 0.6506113369912762
key: train_fscore
value: [0.64530892 0.65740741 0.65437788 0.65747126 0.67734554 0.66972477
0.66513761 0.65446224 0.67268623 0.66819222]
mean value: 0.662211409201409
key: test_precision
value: [0.62962963 0.69565217 0.68181818 0.65217391 0.625 0.65217391
0.66666667 0.8 0.625 0.61538462]
mean value: 0.6643499093499093
key: train_precision
value: [0.64976959 0.66666667 0.66046512 0.66203704 0.67889908 0.67592593
0.6712963 0.65898618 0.66816143 0.67281106]
mean value: 0.666501838002788
key: test_recall
value: [0.70833333 0.64 0.6 0.6 0.6 0.625
0.66666667 0.66666667 0.625 0.66666667]
mean value: 0.6398333333333334
key: train_recall
value: [0.64090909 0.64840183 0.64840183 0.65296804 0.67579909 0.66363636
0.65909091 0.65 0.67727273 0.66363636]
mean value: 0.6580116230801162
key: test_roc_auc
value: [0.7795398 0.7669697 0.7469697 0.73939394 0.73181818 0.75189394
0.77272727 0.8030303 0.74431818 0.75757576]
mean value: 0.7594236770691994
key: train_roc_auc
value: [0.75648148 0.76453705 0.76285638 0.76513948 0.77907601 0.77299465
0.76988159 0.76281513 0.77645149 0.77215432]
mean value: 0.768238757243592
key: test_jcc
value: [0.5 0.5 0.46875 0.45454545 0.44117647 0.46875
0.5 0.57142857 0.45454545 0.47058824]
mean value: 0.4829784186401833
key: train_jcc
value: [0.47635135 0.48965517 0.48630137 0.48972603 0.51211073 0.50344828
0.49828179 0.48639456 0.50680272 0.50171821]
mean value: 0.4950790202442651
MCC on Blind test: 0.52
Accuracy on Blind test: 0.82
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.0364356 0.02421284 0.03152752 0.05748749 0.02250338 0.02640176
0.023844 0.02884269 0.02368212 0.03187656]
mean value: 0.030681395530700685
key: score_time
value: [0.01122618 0.01779962 0.03136301 0.0173564 0.01226211 0.01223731
0.01209617 0.01216483 0.01215291 0.01216865]
mean value: 0.015082716941833496
key: test_mcc
value: [0.29343545 0.59651016 0.60507041 0.55878788 0.46252711 0.60188445
0.42967947 0.64465837 0.68358472 0.29395676]
mean value: 0.5170094779682164
key: train_mcc
value: [0.30971175 0.71001953 0.72215841 0.75055279 0.72811768 0.6782196
0.43138877 0.66381467 0.71322062 0.45778342]
mean value: 0.6164987243986632
key: test_accuracy
value: [0.76923077 0.82417582 0.84615385 0.82417582 0.78021978 0.82222222
0.8 0.86666667 0.87777778 0.76666667]
mean value: 0.8177289377289377
key: train_accuracy
value: [0.76535627 0.86486486 0.89189189 0.8992629 0.88943489 0.86134969
0.79509202 0.87116564 0.87361963 0.80368098]
mean value: 0.8515718786270934
key: test_fscore
value: [0.27586207 0.71428571 0.70833333 0.68 0.61538462 0.71428571
0.4375 0.72727273 0.76595745 0.32258065]
mean value: 0.5961462265497423
key: train_fscore
value: [0.24505929 0.78764479 0.79534884 0.81938326 0.80349345 0.76891616
0.39272727 0.74820144 0.79275654 0.44055944]
mean value: 0.6594090469875463
key: test_precision
value: [0.8 0.64516129 0.73913043 0.68 0.59259259 0.625
0.875 0.8 0.7826087 0.71428571]
mean value: 0.725377872763567
key: train_precision
value: [0.93939394 0.68227425 0.81042654 0.79148936 0.76987448 0.69888476
0.98181818 0.79187817 0.71119134 0.95454545]
mean value: 0.8131776468916367
key: test_recall
value: [0.16666667 0.8 0.68 0.68 0.64 0.83333333
0.29166667 0.66666667 0.75 0.20833333]
mean value: 0.5716666666666667
key: train_recall
value: [0.14090909 0.93150685 0.78082192 0.84931507 0.84018265 0.85454545
0.24545455 0.70909091 0.89545455 0.28636364]
mean value: 0.6533644665836447
key: test_roc_auc
value: [0.57587065 0.81666667 0.79454545 0.77939394 0.73666667 0.82575758
0.63825758 0.8030303 0.83712121 0.58901515]
mean value: 0.7396325192220715
key: train_roc_auc
value: [0.56877104 0.88592149 0.85679751 0.88348106 0.87387284 0.8592055
0.62188694 0.82009167 0.88050038 0.64066081]
mean value: 0.7891189251402788
key: test_jcc
value: [0.16 0.55555556 0.5483871 0.51515152 0.44444444 0.55555556
0.28 0.57142857 0.62068966 0.19230769]
mean value: 0.44435200863899416
key: train_jcc
value: [0.13963964 0.64968153 0.66023166 0.69402985 0.67153285 0.62458472
0.24434389 0.59770115 0.65666667 0.28251121]
mean value: 0.5220923161860291
MCC on Blind test: 0.74
Accuracy on Blind test: 0.89
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03881431 0.03019404 0.03379011 0.03127623 0.03301024 0.02849579
0.0292964 0.03955412 0.06099486 0.02386975]
mean value: 0.03492958545684814
key: score_time
value: [0.01225305 0.0122292 0.01222324 0.01225877 0.01216602 0.01220107
0.02749681 0.01217771 0.01320577 0.01216698]
mean value: 0.013837862014770507
key: test_mcc
value: [0.60569466 0.48174234 0.56049517 0.41303145 0.41497432 0.53935989
0.50376657 0.5154942 0.64241792 0.4268753 ]
mean value: 0.5103851815446988
key: train_mcc
value: [0.63272729 0.65377037 0.64911594 0.4761944 0.57051517 0.56498372
0.69859831 0.47213543 0.66675209 0.67028614]
mean value: 0.6055078840398903
key: test_accuracy
value: [0.8021978 0.73626374 0.83516484 0.61538462 0.79120879 0.83333333
0.82222222 0.68888889 0.86666667 0.78888889]
mean value: 0.778021978021978
key: train_accuracy
value: [0.8022113 0.81941032 0.86855037 0.66830467 0.84152334 0.8392638
0.88588957 0.66134969 0.87484663 0.87607362]
mean value: 0.8137423312883436
key: test_fscore
value: [0.70967742 0.63636364 0.59459459 0.57831325 0.48648649 0.61538462
0.57894737 0.63157895 0.68421053 0.55813953]
mean value: 0.6073696382185204
key: train_fscore
value: [0.72572402 0.74165202 0.69859155 0.61206897 0.60790274 0.60182371
0.75067024 0.60906516 0.73298429 0.72922252]
mean value: 0.6809705210509759
key: test_precision
value: [0.57894737 0.51219512 0.91666667 0.4137931 0.75 0.8
0.78571429 0.46153846 0.92857143 0.63157895]
mean value: 0.6779005383679811
key: train_precision
value: [0.58038147 0.60285714 0.91176471 0.44654088 0.90909091 0.90825688
0.91503268 0.44238683 0.86419753 0.88888889]
mean value: 0.7469397921224509
key: test_recall
value: [0.91666667 0.84 0.44 0.96 0.36 0.5
0.45833333 1. 0.54166667 0.5 ]
mean value: 0.6516666666666666
key: train_recall
value: [0.96818182 0.96347032 0.56621005 0.97260274 0.456621 0.45
0.63636364 0.97727273 0.63636364 0.61818182]
mean value: 0.7245267745952677
key: test_roc_auc
value: [0.83893035 0.76848485 0.71242424 0.72242424 0.65727273 0.72727273
0.70643939 0.78787879 0.76325758 0.6969697 ]
mean value: 0.7381354590682949
key: train_roc_auc
value: [0.85446128 0.86492844 0.77302099 0.76445263 0.71990714 0.71659664
0.80725745 0.76090527 0.79969442 0.79480519]
mean value: 0.7856029453430742
key: test_jcc
value: [0.55 0.46666667 0.42307692 0.40677966 0.32142857 0.44444444
0.40740741 0.46153846 0.52 0.38709677]
mean value: 0.4388438909772972
key: train_jcc
value: [0.56951872 0.58938547 0.53679654 0.44099379 0.43668122 0.43043478
0.60085837 0.43788187 0.5785124 0.57383966]
mean value: 0.5194902824337679
MCC on Blind test: 0.62
Accuracy on Blind test: 0.86
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.27724671 0.26051331 0.26959395 0.26172805 0.26124239 0.25981188
0.26019287 0.25977993 0.25926805 0.26434898]
mean value: 0.26337261199951173
key: score_time
value: [0.01599622 0.01610565 0.01713943 0.0160625 0.01607919 0.01607633
0.01584888 0.01627922 0.01608992 0.01617551]
mean value: 0.016185283660888672
key: test_mcc
value: [0.78071437 0.65648795 0.74898796 0.61393939 0.72424242 0.72435769
0.77979322 0.73450514 0.61782299 0.69186077]
mean value: 0.7072711911836023
key: train_mcc
value: [0.877768 0.90397549 0.90658396 0.91638126 0.90285852 0.89233213
0.90011331 0.87625403 0.86536718 0.88170359]
mean value: 0.8923337459745163
key: test_accuracy
value: [0.91208791 0.85714286 0.9010989 0.84615385 0.89010989 0.88888889
0.91111111 0.88888889 0.85555556 0.87777778]
mean value: 0.8828815628815628
key: train_accuracy
value: [0.95208845 0.96191646 0.96314496 0.96683047 0.96068796 0.95705521
0.9607362 0.95092025 0.94601227 0.95337423]
mean value: 0.9572766464177507
key: test_fscore
value: [0.84 0.75471698 0.81632653 0.72 0.8 0.8
0.84 0.80769231 0.71111111 0.7755102 ]
mean value: 0.7865357134629373
key: train_fscore
value: [0.91034483 0.93002257 0.93181818 0.93905192 0.92920354 0.92170022
0.92694064 0.90990991 0.90222222 0.91363636]
mean value: 0.9214850400078269
key: test_precision
value: [0.80769231 0.71428571 0.83333333 0.72 0.8 0.76923077
0.80769231 0.75 0.76190476 0.76 ]
mean value: 0.7724139194139195
key: train_precision
value: [0.92093023 0.91964286 0.92760181 0.92857143 0.90128755 0.90748899
0.93119266 0.90178571 0.8826087 0.91363636]
mean value: 0.9134746302784097
key: test_recall
value: [0.875 0.8 0.8 0.72 0.8 0.83333333
0.875 0.875 0.66666667 0.79166667]
mean value: 0.8036666666666666
key: train_recall
value: [0.9 0.94063927 0.93607306 0.94977169 0.95890411 0.93636364
0.92272727 0.91818182 0.92272727 0.91363636]
mean value: 0.9299024491490245
key: test_roc_auc
value: [0.90018657 0.83939394 0.86969697 0.8069697 0.86212121 0.87121212
0.89962121 0.8844697 0.79545455 0.85037879]
mean value: 0.8579504748982361
key: train_roc_auc
value: [0.93569024 0.95519358 0.95459115 0.96144047 0.96012432 0.95053476
0.94875859 0.94060351 0.93867456 0.9408518 ]
mean value: 0.9486462985637039
key: test_jcc
value: [0.72413793 0.60606061 0.68965517 0.5625 0.66666667 0.66666667
0.72413793 0.67741935 0.55172414 0.63333333]
mean value: 0.6502301799979775
key: train_jcc
value: [0.83544304 0.86919831 0.87234043 0.88510638 0.8677686 0.85477178
0.86382979 0.83471074 0.82186235 0.84100418]
mean value: 0.8546035601309547
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.1903882 0.21745563 0.21988535 0.23056459 0.21335316 0.09320831
0.09771919 0.22602701 0.21265531 0.1860373 ]
mean value: 0.1887294054031372
key: score_time
value: [0.02607679 0.03801036 0.0424521 0.04715228 0.01995468 0.03384686
0.0404861 0.02912593 0.0286665 0.01805949]
mean value: 0.03238310813903809
key: test_mcc
value: [0.80485509 0.53716427 0.78588153 0.78588153 0.72424242 0.68023136
0.76553182 0.72435769 0.76634133 0.74119017]
mean value: 0.7315677216232281
key: train_mcc
value: [0.97503169 0.99063281 0.99063281 0.9843585 0.98122438 0.97190427
0.98441714 0.99377387 0.97817158 0.99377387]
mean value: 0.984392091616367
key: test_accuracy
value: [0.92307692 0.81318681 0.91208791 0.91208791 0.89010989 0.86666667
0.91111111 0.88888889 0.91111111 0.9 ]
mean value: 0.8928327228327229
key: train_accuracy
value: [0.99017199 0.9963145 0.9963145 0.99385749 0.99262899 0.98895706
0.99386503 0.99754601 0.99141104 0.99754601]
mean value: 0.9938612622661702
key: test_fscore
value: [0.85714286 0.66666667 0.84615385 0.84615385 0.8 0.76923077
0.81818182 0.8 0.80952381 0.80851064]
mean value: 0.8021564251351486
key: train_fscore
value: [0.98173516 0.99310345 0.99310345 0.98850575 0.98623853 0.97940503
0.98861048 0.99545455 0.98390805 0.99545455]
mean value: 0.9885518985176559
key: test_precision
value: [0.84 0.65384615 0.81481481 0.81481481 0.8 0.71428571
0.9 0.76923077 0.94444444 0.82608696]
mean value: 0.8077523667958451
key: train_precision
value: [0.98623853 1. 1. 0.99537037 0.99078341 0.98617512
0.99086758 0.99545455 0.99534884 0.99545455]
mean value: 0.9935692935853153
key: test_recall
value: [0.875 0.68 0.88 0.88 0.8 0.83333333
0.75 0.83333333 0.70833333 0.79166667]
mean value: 0.8031666666666667
key: train_recall
value: [0.97727273 0.98630137 0.98630137 0.98173516 0.98173516 0.97272727
0.98636364 0.99545455 0.97272727 0.99545455]
mean value: 0.983607305936073
key: test_roc_auc
value: [0.90764925 0.77181818 0.90212121 0.90212121 0.86212121 0.85606061
0.85984848 0.87121212 0.84659091 0.8655303 ]
mean value: 0.8645073496155585
key: train_roc_auc
value: [0.98611111 0.99315068 0.99315068 0.99002724 0.98918691 0.98384263
0.99150115 0.99688694 0.9855233 0.99688694]
mean value: 0.9906267579676121
key: test_jcc
value: [0.75 0.5 0.73333333 0.73333333 0.66666667 0.625
0.69230769 0.66666667 0.68 0.67857143]
mean value: 0.6725879120879121
key: train_jcc
value: [0.96412556 0.98630137 0.98630137 0.97727273 0.97285068 0.95964126
0.97747748 0.99095023 0.96832579 0.99095023]
mean value: 0.9774196683696653
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.41849947 0.58950043 0.76989412 0.91573668 0.42887688 0.46664572
0.4057734 0.81402564 0.99178576 0.49223542]
mean value: 0.6292973518371582
key: score_time
value: [0.09931445 0.02061701 0.05184293 0.03578353 0.02096629 0.0361588
0.03552747 0.05565429 0.020648 0.04811263]
mean value: 0.04246253967285156
key: test_mcc
value: [0.53958171 0.24362069 0.31156172 0.28426762 0.41497432 0.31195619
0.35478744 0.5063517 0.42640143 0.32401531]
mean value: 0.3717518137838298
key: train_mcc
value: [0.88453181 0.87827 0.88766131 0.88416629 0.89358226 0.89428983
0.88144713 0.88144713 0.86891849 0.87797787]
mean value: 0.8832292128981972
key: test_accuracy
value: [0.83516484 0.74725275 0.75824176 0.75824176 0.79120879 0.76666667
0.77777778 0.82222222 0.8 0.76666667]
mean value: 0.7823443223443224
key: train_accuracy
value: [0.95454545 0.95208845 0.95577396 0.95454545 0.95823096 0.95828221
0.95337423 0.95337423 0.94846626 0.95214724]
mean value: 0.9540828446963416
key: test_fscore
value: [0.59459459 0.3030303 0.42105263 0.3125 0.48648649 0.4
0.44444444 0.52941176 0.47058824 0.43243243]
mean value: 0.43945408925672086
key: train_fscore
value: [0.90864198 0.90225564 0.91044776 0.90818859 0.91625616 0.91625616
0.90594059 0.90594059 0.895 0.9037037 ]
mean value: 0.9072631168301808
key: test_precision
value: [0.84615385 0.625 0.61538462 0.71428571 0.75 0.63636364
0.66666667 0.9 0.8 0.61538462]
mean value: 0.7169239094239095
key: train_precision
value: [0.99459459 1. 1. 0.99456522 0.99465241 1.
0.99456522 0.99456522 0.99444444 0.98918919]
mean value: 0.9956576286819253
key: test_recall
value: [0.45833333 0.2 0.32 0.2 0.36 0.29166667
0.33333333 0.375 0.33333333 0.33333333]
mean value: 0.3205
key: train_recall
value: [0.83636364 0.82191781 0.83561644 0.83561644 0.84931507 0.84545455
0.83181818 0.83181818 0.81363636 0.83181818]
mean value: 0.8333374844333749
key: test_roc_auc
value: [0.71424129 0.57727273 0.62212121 0.58484848 0.65727273 0.6155303
0.63636364 0.67992424 0.65151515 0.62878788]
mean value: 0.6367877657168702
key: train_roc_auc
value: [0.91734007 0.9109589 0.91780822 0.91696788 0.9238172 0.92272727
0.91506875 0.91506875 0.90597785 0.91422842]
mean value: 0.9159963318383947
key: test_jcc
value: [0.42307692 0.17857143 0.26666667 0.18518519 0.32142857 0.25
0.28571429 0.36 0.30769231 0.27586207]
mean value: 0.28541974373008855
key: train_jcc
value: [0.83257919 0.82191781 0.83561644 0.83181818 0.84545455 0.84545455
0.8280543 0.8280543 0.80995475 0.82432432]
mean value: 0.8303228377563591
MCC on Blind test: 0.3
Accuracy on Blind test: 0.77
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.28913355 1.31570816 1.21427155 1.3062737 1.31983781 1.22301054
1.25804472 1.31995225 1.30364299 1.22600722]
mean value: 1.2775882482528687
key: score_time
value: [0.010149 0.0146482 0.01436591 0.01001978 0.01055002 0.01526237
0.01032352 0.00997353 0.01475406 0.00955248]
mean value: 0.01195988655090332
key: test_mcc
value: [0.83591639 0.7098276 0.78588153 0.79423548 0.80485509 0.78877892
0.79879562 0.81147376 0.76784594 0.81147376]
mean value: 0.7909084087989999
key: train_mcc
value: [0.99067471 0.98750624 0.98748946 0.9843585 0.98133231 0.98754775
1. 0.99067895 0.98132162 0.99377387]
mean value: 0.9884683404995436
key: test_accuracy
value: [0.93406593 0.87912088 0.91208791 0.91208791 0.92307692 0.91111111
0.92222222 0.92222222 0.91111111 0.92222222]
mean value: 0.914932844932845
key: train_accuracy
value: [0.9963145 0.995086 0.995086 0.99385749 0.99262899 0.99509202
1. 0.99631902 0.99263804 0.99754601]
mean value: 0.9954568064997513
key: test_fscore
value: [0.88 0.79245283 0.84615385 0.85185185 0.85714286 0.84615385
0.85106383 0.8627451 0.82608696 0.8627451 ]
mean value: 0.8476396213878485
key: train_fscore
value: [0.99319728 0.99086758 0.99082569 0.98850575 0.98636364 0.99090909
1. 0.99319728 0.98636364 0.99545455]
mean value: 0.9915684482022545
key: test_precision
value: [0.84615385 0.75 0.81481481 0.79310345 0.875 0.78571429
0.86956522 0.81481481 0.86363636 0.81481481]
mean value: 0.8227617605616107
key: train_precision
value: [0.99095023 0.99086758 0.99539171 0.99537037 0.98190045 0.99090909
1. 0.99095023 0.98636364 0.99545455]
mean value: 0.9918157833052819
key: test_recall
value: [0.91666667 0.84 0.88 0.92 0.84 0.91666667
0.83333333 0.91666667 0.79166667 0.91666667]
mean value: 0.8771666666666667
key: train_recall
value: [0.99545455 0.99086758 0.98630137 0.98173516 0.99086758 0.99090909
1. 0.99545455 0.98636364 0.99545455]
mean value: 0.9913408053134081
key: test_roc_auc
value: [0.92848259 0.8669697 0.90212121 0.91454545 0.89727273 0.91287879
0.89393939 0.92045455 0.87310606 0.92045455]
mean value: 0.90302250113071
key: train_roc_auc
value: [0.99604377 0.99375312 0.99231035 0.99002724 0.99207245 0.99377387
1. 0.9960466 0.99066081 0.99688694]
mean value: 0.9941575146732279
key: test_jcc
value: [0.78571429 0.65625 0.73333333 0.74193548 0.75 0.73333333
0.74074074 0.75862069 0.7037037 0.75862069]
mean value: 0.736225226000671
key: train_jcc
value: [0.98648649 0.98190045 0.98181818 0.97727273 0.97309417 0.98198198
1. 0.98648649 0.97309417 0.99095023]
mean value: 0.9833084883586071
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.14145827 0.06715274 0.04703712 0.0477314 0.07376885 0.05179882
0.08239913 0.05850697 0.14945626 0.13114095]
mean value: 0.08504505157470703
key: score_time
value: [0.013623 0.01364827 0.01348495 0.01350093 0.0139091 0.01341081
0.01348758 0.01463079 0.01364398 0.0333693 ]
mean value: 0.01567087173461914
key: test_mcc
value: [ 0.13146936 0.08884904 -0.20842878 0.21624971 -0.10050378 -0.15853511
0.14830704 0.06043672 -0.1571382 0.16261091]
mean value: 0.018331691626637253
key: train_mcc
value: [0.20644118 0.25718751 0.20851971 0.18754191 0.19327336 0.20749178
0.1992175 0.19500127 0.20061074 0.18198662]
mean value: 0.2037271576439099
key: test_accuracy
value: [0.38461538 0.40659341 0.26373626 0.38461538 0.30769231 0.27777778
0.36666667 0.31111111 0.28888889 0.37777778]
mean value: 0.3369474969474969
key: train_accuracy
value: [0.37346437 0.42137592 0.37469287 0.35626536 0.36117936 0.37423313
0.36687117 0.36319018 0.36809816 0.35214724]
mean value: 0.37115177642785
key: test_fscore
value: [0.44 0.4375 0.37383178 0.47169811 0.38834951 0.36893204
0.44660194 0.42592593 0.36 0.45098039]
mean value: 0.41638197021369017
key: train_fscore
value: [0.46315789 0.48184818 0.4625132 0.45530146 0.45720251 0.46315789
0.46025105 0.45881126 0.46073298 0.45454545]
mean value: 0.4617521880985164
key: test_precision
value: [0.28947368 0.29577465 0.24390244 0.30864198 0.25641026 0.24050633
0.29113924 0.27380952 0.23684211 0.29487179]
mean value: 0.27313719964058686
key: train_precision
value: [0.30136986 0.3173913 0.30082418 0.29475101 0.29634641 0.30136986
0.29891304 0.29769959 0.29931973 0.29411765]
mean value: 0.3002102642167985
key: test_recall
value: [0.91666667 0.84 0.8 1. 0.8 0.79166667
0.95833333 0.95833333 0.75 0.95833333]
mean value: 0.8773333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.55534826 0.54121212 0.43030303 0.57575758 0.46060606 0.44128788
0.55492424 0.51704545 0.43560606 0.5625 ]
mean value: 0.5074590682948892
key: train_roc_auc
value: [0.57070707 0.60420168 0.57226891 0.55966387 0.56302521 0.57142857
0.56638655 0.56386555 0.56722689 0.55630252]
mean value: 0.5695076818606231
key: test_jcc
value: [0.28205128 0.28 0.22988506 0.30864198 0.24096386 0.22619048
0.2875 0.27058824 0.2195122 0.29113924]
mean value: 0.26364723173657495
key: train_jcc
value: [0.30136986 0.3173913 0.30082418 0.29475101 0.29634641 0.30136986
0.29891304 0.29769959 0.29931973 0.29411765]
mean value: 0.3002102642167985
MCC on Blind test: 0.1
Accuracy on Blind test: 0.36
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.0502224 0.06870103 0.07678366 0.05153155 0.07788491 0.06945562
0.07719898 0.06175971 0.06251836 0.01799321]
mean value: 0.06140494346618652
key: score_time
value: [0.03146505 0.02858329 0.03003502 0.03028297 0.02979541 0.02962589
0.03054166 0.03654528 0.01298213 0.02162504]
mean value: 0.028148174285888672
key: test_mcc
value: [0.66044776 0.58138656 0.69312083 0.53181644 0.42817442 0.57966713
0.45226702 0.64465837 0.73663511 0.48863636]
mean value: 0.579680999444377
key: train_mcc
value: [0.71598517 0.71445751 0.70526551 0.71583656 0.72435597 0.69503973
0.71752285 0.6986792 0.69083097 0.70780065]
mean value: 0.7085774124854681
key: test_accuracy
value: [0.86813187 0.83516484 0.87912088 0.82417582 0.76923077 0.83333333
0.8 0.86666667 0.9 0.8 ]
mean value: 0.8375824175824176
key: train_accuracy
value: [0.88943489 0.88943489 0.88574939 0.88943489 0.89189189 0.88220859
0.88957055 0.88220859 0.8797546 0.88711656]
mean value: 0.8866804841651468
key: test_fscore
value: [0.75 0.69387755 0.7755102 0.63636364 0.58823529 0.69387755
0.57142857 0.72727273 0.8 0.625 ]
mean value: 0.6861565535305031
key: train_fscore
value: [0.79069767 0.78873239 0.78220141 0.79069767 0.79816514 0.77358491
0.79262673 0.77880184 0.77209302 0.78301887]
mean value: 0.7850619654239601
key: test_precision
value: [0.75 0.70833333 0.79166667 0.73684211 0.57692308 0.68
0.66666667 0.8 0.85714286 0.625 ]
mean value: 0.7192574705995759
key: train_precision
value: [0.80952381 0.8115942 0.80288462 0.8056872 0.80184332 0.80392157
0.80373832 0.78971963 0.79047619 0.81372549]
mean value: 0.8033114342795749
key: test_recall
value: [0.75 0.68 0.76 0.56 0.6 0.70833333
0.5 0.66666667 0.75 0.625 ]
mean value: 0.66
key: train_recall
value: [0.77272727 0.76712329 0.76255708 0.77625571 0.79452055 0.74545455
0.78181818 0.76818182 0.75454545 0.75454545]
mean value: 0.7677729348277293
key: test_roc_auc
value: [0.83022388 0.7869697 0.84212121 0.74212121 0.71666667 0.79356061
0.70454545 0.8030303 0.85227273 0.74431818]
mean value: 0.7815829941203075
key: train_roc_auc
value: [0.8526936 0.85078853 0.84682476 0.85367407 0.86112582 0.83911383
0.85561497 0.84627578 0.84029794 0.84533995]
mean value: 0.8491749262317353
key: test_jcc
value: [0.6 0.53125 0.63333333 0.46666667 0.41666667 0.53125
0.4 0.57142857 0.66666667 0.45454545]
mean value: 0.5271807359307359
key: train_jcc
value: [0.65384615 0.65116279 0.64230769 0.65384615 0.66412214 0.63076923
0.65648855 0.63773585 0.62878788 0.64341085]
mean value: 0.6462477289047467
MCC on Blind test: 0.59
Accuracy on Blind test: 0.85
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.44820738 0.80184031 0.63607836 0.53813601 0.51265216 0.4790895
0.677001 0.53256416 0.59983206 0.68059635]
mean value: 0.5905997276306152
key: score_time
value: [0.04019618 0.02686477 0.02605987 0.03050518 0.0192945 0.04190254
0.01563525 0.03461552 0.02633142 0.03640819]
mean value: 0.029781341552734375
key: test_mcc
value: [0.66044776 0.52551942 0.70064905 0.53181644 0.42817442 0.57966713
0.4792982 0.64465837 0.73663511 0.48863636]
mean value: 0.5775502262717324
key: train_mcc
value: [0.71598517 0.72307926 0.72431999 0.71583656 0.72435597 0.69503973
0.72452859 0.6986792 0.70276203 0.70780065]
mean value: 0.7132387140179451
key: test_accuracy
value: [0.86813187 0.81318681 0.87912088 0.82417582 0.76923077 0.83333333
0.81111111 0.86666667 0.9 0.8 ]
mean value: 0.8364957264957265
key: train_accuracy
value: [0.88943489 0.89312039 0.89312039 0.88943489 0.89189189 0.88220859
0.89202454 0.88220859 0.88466258 0.88711656]
mean value: 0.8885223315898163
key: test_fscore
value: [0.75 0.65306122 0.78431373 0.63636364 0.58823529 0.69387755
0.58536585 0.72727273 0.8 0.625 ]
mean value: 0.6843490012412947
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:118: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.79069767 0.79432624 0.79625293 0.79069767 0.79816514 0.77358491
0.79816514 0.77880184 0.78037383 0.78301887]
mean value: 0.7884084241280366
key: test_precision
value: [0.75 0.66666667 0.76923077 0.73684211 0.57692308 0.68
0.70588235 0.8 0.85714286 0.625 ]
mean value: 0.7167687828167705
key: train_precision
value: [0.80952381 0.82352941 0.81730769 0.8056872 0.80184332 0.80392157
0.80555556 0.78971963 0.80288462 0.81372549]
mean value: 0.8073698291291952
key: test_recall
value: [0.75 0.64 0.8 0.56 0.6 0.70833333
0.5 0.66666667 0.75 0.625 ]
mean value: 0.66
key: train_recall
value: [0.77272727 0.76712329 0.77625571 0.77625571 0.79452055 0.74545455
0.79090909 0.76818182 0.75909091 0.75454545]
mean value: 0.7705064342050643
key: test_roc_auc
value: [0.83022388 0.75939394 0.85454545 0.74212121 0.71666667 0.79356061
0.71212121 0.8030303 0.85227273 0.74431818]
mean value: 0.7808254183627318
key: train_roc_auc
value: [0.8526936 0.85330954 0.85619508 0.85367407 0.86112582 0.83911383
0.86016043 0.84627578 0.84509167 0.84533995]
mean value: 0.8512979784414112
key: test_jcc
value: [0.6 0.48484848 0.64516129 0.46666667 0.41666667 0.53125
0.4137931 0.57142857 0.66666667 0.45454545]
mean value: 0.5251026904593368
key: train_jcc
value: [0.65384615 0.65882353 0.6614786 0.65384615 0.66412214 0.63076923
0.66412214 0.63773585 0.63984674 0.64341085]
mean value: 0.6508001386969055
MCC on Blind test: 0.59
Accuracy on Blind test: 0.85
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.08875203 0.17942286 0.20372772 0.21905112 0.11756659 0.10055804
0.20381188 0.14366698 0.14148211 0.22374272]
mean value: 0.1621782064437866
key: score_time
value: [0.02979755 0.01557469 0.02248502 0.03827453 0.02860117 0.0163877
0.01603985 0.01669645 0.0194993 0.02031589]
mean value: 0.022367215156555174
key: test_mcc
value: [0.77892275 0.73050958 0.74942473 0.82425939 0.71319972 0.7800135
0.85201287 0.78932976 0.86452993 0.74456392]
mean value: 0.7826766159113144
key: train_mcc
value: [0.81677727 0.82356215 0.81538413 0.80551688 0.8115084 0.81720389
0.8018788 0.81804388 0.81099596 0.81850391]
mean value: 0.8139375262794872
key: test_accuracy
value: [0.88721805 0.86466165 0.87121212 0.90909091 0.84848485 0.88636364
0.92424242 0.89393939 0.93181818 0.87121212]
mean value: 0.8888243335611756
key: train_accuracy
value: [0.90664424 0.91000841 0.90588235 0.90084034 0.90420168 0.90672269
0.89915966 0.90672269 0.90336134 0.90756303]
mean value: 0.9051106430797718
key: test_fscore
value: [0.89208633 0.86956522 0.87943262 0.91428571 0.8630137 0.89361702
0.92753623 0.89705882 0.93333333 0.87591241]
mean value: 0.8945841404138405
key: train_fscore
value: [0.91084337 0.91391794 0.91011236 0.90544872 0.90821256 0.91098637
0.90369181 0.91141261 0.90807354 0.91157556]
mean value: 0.9094274846536654
key: test_precision
value: [0.84931507 0.84507042 0.82666667 0.86486486 0.7875 0.84
0.88888889 0.87142857 0.91304348 0.84507042]
mean value: 0.8531848383673435
key: train_precision
value: [0.87230769 0.8751926 0.87096774 0.86523737 0.87171561 0.87116564
0.86482335 0.86778116 0.86585366 0.87365177]
mean value: 0.8698696593137184
key: test_recall
value: [0.93939394 0.89552239 0.93939394 0.96969697 0.95454545 0.95454545
0.96969697 0.92424242 0.95454545 0.90909091]
mean value: 0.9410673903211217
key: train_recall
value: [0.95294118 0.95622896 0.95294118 0.94957983 0.94789916 0.95462185
0.94621849 0.95966387 0.95462185 0.95294118]
mean value: 0.9527657527657527
key: test_roc_auc
value: [0.88760742 0.86442786 0.87121212 0.90909091 0.84848485 0.88636364
0.92424242 0.89393939 0.93181818 0.87121212]
mean value: 0.8888398914518317
key: train_roc_auc
value: [0.90660527 0.91004725 0.90588235 0.90084034 0.90420168 0.90672269
0.89915966 0.90672269 0.90336134 0.90756303]
mean value: 0.9051106301106302
key: test_jcc
value: [0.80519481 0.76923077 0.78481013 0.84210526 0.75903614 0.80769231
0.86486486 0.81333333 0.875 0.77922078]
mean value: 0.8100488393855346
key: train_jcc
value: [0.83628319 0.84148148 0.83505155 0.8272328 0.83185841 0.8365243
0.82430454 0.8372434 0.83162518 0.83751846]
mean value: 0.8339123305107486
MCC on Blind test: 0.72
Accuracy on Blind test: 0.89
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [3.9267838 4.17609477 3.35307336 4.36423445 3.8363719 3.76100397
3.2580955 3.22691202 2.85449815 2.63607287]
mean value: 3.539314079284668
key: score_time
value: [0.02041101 0.02339602 0.01942444 0.03063869 0.0130496 0.01654649
0.01253414 0.02452779 0.01980877 0.0202508 ]
mean value: 0.02005877494812012
key: test_mcc
value: [0.79098805 0.75938489 0.76072577 0.82425939 0.77521709 0.79373126
0.8196886 0.78932976 0.84887469 0.74456392]
mean value: 0.7906763433813555
key: train_mcc
value: [0.83386511 0.84062945 0.83242875 0.8308463 0.82926583 0.82747573
0.82867279 0.83669268 0.8646291 0.83916906]
mean value: 0.8363674787603758
key: test_accuracy
value: [0.89473684 0.87969925 0.87878788 0.90909091 0.87878788 0.89393939
0.90909091 0.89393939 0.92424242 0.87121212]
mean value: 0.8933526999316472
key: train_accuracy
value: [0.91589571 0.91925988 0.91512605 0.91428571 0.91344538 0.91260504
0.91344538 0.91680672 0.93193277 0.91848739]
mean value: 0.9171290046716752
key: test_fscore
value: [0.89705882 0.88059701 0.88405797 0.91428571 0.89041096 0.9
0.91176471 0.89705882 0.92537313 0.87591241]
mean value: 0.8976519555158349
key: train_fscore
value: [0.91883117 0.92195122 0.91808597 0.91734198 0.91659919 0.91572123
0.91619203 0.92022562 0.93333333 0.92133009]
mean value: 0.9199611829964236
key: test_precision
value: [0.87142857 0.88059701 0.84722222 0.86486486 0.8125 0.85135135
0.88571429 0.87142857 0.91176471 0.84507042]
mean value: 0.8641942010352804
key: train_precision
value: [0.88854003 0.89150943 0.88714734 0.885759 0.884375 0.88419405
0.88801262 0.88390093 0.91451613 0.89028213]
mean value: 0.8898236660208628
key: test_recall
value: [0.92424242 0.88059701 0.92424242 0.96969697 0.98484848 0.95454545
0.93939394 0.92424242 0.93939394 0.90909091]
mean value: 0.9350293984622343
key: train_recall
value: [0.9512605 0.95454545 0.9512605 0.9512605 0.9512605 0.94957983
0.94621849 0.95966387 0.95294118 0.95462185]
mean value: 0.9522612681436211
key: test_roc_auc
value: [0.89495703 0.87969245 0.87878788 0.90909091 0.87878788 0.89393939
0.90909091 0.89393939 0.92424242 0.87121212]
mean value: 0.893374038896427
key: train_roc_auc
value: [0.91586594 0.91928953 0.91512605 0.91428571 0.91344538 0.91260504
0.91344538 0.91680672 0.93193277 0.91848739]
mean value: 0.917128993011346
key: test_jcc
value: [0.81333333 0.78666667 0.79220779 0.84210526 0.80246914 0.81818182
0.83783784 0.81333333 0.86111111 0.77922078]
mean value: 0.8146467070853036
key: train_jcc
value: [0.84984985 0.85520362 0.84857571 0.84730539 0.84603886 0.8445441
0.84534535 0.85223881 0.875 0.85413534]
mean value: 0.8518237020427452
MCC on Blind test: 0.7
Accuracy on Blind test: 0.88
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.01902628 0.02021194 0.01614714 0.01634669 0.01632619 0.01641297
0.01635528 0.0163734 0.01630759 0.01628184]
mean value: 0.016978931427001954
key: score_time
value: [0.01291609 0.01158118 0.01146913 0.01135659 0.01135349 0.01156807
0.01147318 0.01137495 0.01159525 0.01156211]
mean value: 0.011625003814697266
key: test_mcc
value: [0.50384441 0.57928072 0.5611861 0.54645907 0.48574139 0.60633906
0.47018295 0.59152048 0.65521076 0.47412585]
mean value: 0.5473890791718706
key: train_mcc
value: [0.55295752 0.54784708 0.54518576 0.54641608 0.56334355 0.54310572
0.56322889 0.53467206 0.55645658 0.55167283]
mean value: 0.5504886058689347
key: test_accuracy
value: [0.7518797 0.78947368 0.78030303 0.77272727 0.74242424 0.8030303
0.73484848 0.79545455 0.82575758 0.73484848]
mean value: 0.7730747322852586
key: train_accuracy
value: [0.77628259 0.77375946 0.77226891 0.77310924 0.78151261 0.77142857
0.78151261 0.76722689 0.77815126 0.77563025]
mean value: 0.7750882388279113
key: test_fscore
value: [0.7518797 0.78787879 0.7751938 0.765625 0.734375 0.8
0.72868217 0.8 0.83453237 0.71544715]
mean value: 0.7693613984691421
key: train_fscore
value: [0.77226027 0.76949443 0.76658053 0.77001704 0.77777778 0.76791809
0.77853492 0.76385337 0.7755102 0.77120823]
mean value: 0.7713154861523569
key: test_precision
value: [0.74626866 0.8 0.79365079 0.79032258 0.75806452 0.8125
0.74603175 0.7826087 0.79452055 0.77192982]
mean value: 0.7795897361331934
key: train_precision
value: [0.78708551 0.78359511 0.78621908 0.7806563 0.79130435 0.77989601
0.78929188 0.77508651 0.7848537 0.78671329]
mean value: 0.7844701750183688
key: test_recall
value: [0.75757576 0.7761194 0.75757576 0.74242424 0.71212121 0.78787879
0.71212121 0.81818182 0.87878788 0.66666667]
mean value: 0.7609452736318408
key: train_recall
value: [0.75798319 0.75589226 0.74789916 0.75966387 0.76470588 0.75630252
0.76806723 0.75294118 0.76638655 0.75630252]
mean value: 0.7586144356732591
key: test_roc_auc
value: [0.75192221 0.78957485 0.78030303 0.77272727 0.74242424 0.8030303
0.73484848 0.79545455 0.82575758 0.73484848]
mean value: 0.7730890999547716
key: train_roc_auc
value: [0.77629799 0.77374445 0.77226891 0.77310924 0.78151261 0.77142857
0.78151261 0.76722689 0.77815126 0.77563025]
mean value: 0.7750882777353365
key: test_jcc
value: [0.60240964 0.65 0.63291139 0.62025316 0.58024691 0.66666667
0.57317073 0.66666667 0.71604938 0.55696203]
mean value: 0.6265336582169645
key: train_jcc
value: [0.62900976 0.62534819 0.62150838 0.62603878 0.63636364 0.6232687
0.63737796 0.61793103 0.63333333 0.62761506]
mean value: 0.6277794842107693
MCC on Blind test: 0.51
Accuracy on Blind test: 0.8
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01857138 0.01959419 0.01895595 0.01894522 0.01899338 0.01901722
0.01656079 0.01675606 0.01655316 0.02041316]
mean value: 0.01843605041503906
key: score_time
value: [0.01313376 0.01306129 0.0130434 0.01314616 0.01308227 0.01307869
0.01137018 0.0112958 0.01129675 0.01233172]
mean value: 0.012484002113342284
key: test_mcc
value: [0.68631183 0.74631701 0.63636364 0.61313934 0.54950626 0.70511024
0.73267501 0.71220297 0.68568568 0.5611861 ]
mean value: 0.6628498064589106
key: train_mcc
value: [0.67197059 0.68369068 0.67740153 0.66050973 0.68212887 0.67495193
0.67082886 0.65736006 0.68191117 0.66546459]
mean value: 0.6726218014876005
key: test_accuracy
value: [0.84210526 0.87218045 0.81818182 0.8030303 0.77272727 0.84848485
0.86363636 0.85606061 0.84090909 0.78030303]
mean value: 0.8297619047619047
key: train_accuracy
value: [0.83515559 0.84020185 0.83781513 0.82857143 0.8394958 0.83613445
0.83445378 0.82689076 0.8394958 0.83109244]
mean value: 0.8349307023061537
key: test_fscore
value: [0.84671533 0.87769784 0.81818182 0.81690141 0.78571429 0.85915493
0.87142857 0.85714286 0.84892086 0.7751938 ]
mean value: 0.8357051702448439
key: train_fscore
value: [0.84090909 0.84751204 0.84347121 0.8368 0.8468324 0.84312148
0.84048583 0.83546326 0.84658635 0.83907126]
mean value: 0.8420252907043897
key: test_precision
value: [0.81690141 0.84722222 0.81818182 0.76315789 0.74324324 0.80263158
0.82432432 0.85074627 0.80821918 0.79365079]
mean value: 0.8068278730496224
key: train_precision
value: [0.81318681 0.80981595 0.81504702 0.79847328 0.80981595 0.80864198
0.8109375 0.79604262 0.81076923 0.80122324]
mean value: 0.8073953585042138
key: test_recall
value: [0.87878788 0.91044776 0.81818182 0.87878788 0.83333333 0.92424242
0.92424242 0.86363636 0.89393939 0.75757576]
mean value: 0.8683175033921302
key: train_recall
value: [0.87058824 0.88888889 0.87394958 0.8789916 0.88739496 0.88067227
0.87226891 0.8789916 0.88571429 0.88067227]
mean value: 0.879813258636788
key: test_roc_auc
value: [0.84237901 0.87189055 0.81818182 0.8030303 0.77272727 0.84848485
0.86363636 0.85606061 0.84090909 0.78030303]
mean value: 0.8297602894617819
key: train_roc_auc
value: [0.83512577 0.84024276 0.83781513 0.82857143 0.8394958 0.83613445
0.83445378 0.82689076 0.8394958 0.83109244]
mean value: 0.8349318111082817
key: test_jcc
value: [0.73417722 0.78205128 0.69230769 0.69047619 0.64705882 0.75308642
0.7721519 0.75 0.7375 0.63291139]
mean value: 0.7191720914446778
key: train_jcc
value: [0.7254902 0.73537604 0.72931276 0.71939477 0.73435327 0.72878999
0.72486034 0.71742112 0.73398329 0.72275862]
mean value: 0.7271740398801881
MCC on Blind test: 0.59
Accuracy on Blind test: 0.83
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01500893 0.01498532 0.01499057 0.01498437 0.01529908 0.01477098
0.01468444 0.01380014 0.01486588 0.01469088]
mean value: 0.014808058738708496
key: score_time
value: [0.04966807 0.02881622 0.03009772 0.03811646 0.02876663 0.03211856
0.02854252 0.03201628 0.02830029 0.02785945]
mean value: 0.03243021965026856
key: test_mcc
value: [0.73247531 0.67649057 0.57602211 0.73576721 0.69986771 0.63002408
0.68219104 0.68378319 0.66943868 0.67445327]
mean value: 0.6760513177693777
key: train_mcc
value: [0.76895612 0.78591503 0.79045033 0.76630612 0.78213811 0.77933874
0.77849745 0.76607208 0.77720446 0.78499714]
mean value: 0.7779875592649617
key: test_accuracy
value: [0.86466165 0.83458647 0.78787879 0.86363636 0.84848485 0.81060606
0.83333333 0.84090909 0.83333333 0.83333333]
mean value: 0.8350763271815903
key: train_accuracy
value: [0.88225399 0.89066442 0.89327731 0.88151261 0.88907563 0.88823529
0.88739496 0.88067227 0.88655462 0.8907563 ]
mean value: 0.8870397410436
key: test_fscore
value: [0.86956522 0.84722222 0.78461538 0.87323944 0.85507246 0.82517483
0.84931507 0.84671533 0.84057971 0.84507042]
mean value: 0.8436570079432013
key: train_fscore
value: [0.88835726 0.89616613 0.89831865 0.88674699 0.8944 0.89282836
0.89262821 0.88694268 0.89208633 0.89566613]
mean value: 0.8924140740905642
key: test_precision
value: [0.83333333 0.79220779 0.796875 0.81578947 0.81944444 0.76623377
0.775 0.81690141 0.80555556 0.78947368]
mean value: 0.8010814458120333
key: train_precision
value: [0.84522003 0.85258359 0.85779817 0.84923077 0.85343511 0.85758514
0.85298622 0.84266263 0.85060976 0.85714286]
mean value: 0.8519254268239733
key: test_recall
value: [0.90909091 0.91044776 0.77272727 0.93939394 0.89393939 0.89393939
0.93939394 0.87878788 0.87878788 0.90909091]
mean value: 0.8925599276345545
key: train_recall
value: [0.93613445 0.94444444 0.94285714 0.92773109 0.9394958 0.93109244
0.93613445 0.93613445 0.93781513 0.93781513]
mean value: 0.9369654528478057
key: test_roc_auc
value: [0.86499322 0.83401176 0.78787879 0.86363636 0.84848485 0.81060606
0.83333333 0.84090909 0.83333333 0.83333333]
mean value: 0.835052012663953
key: train_roc_auc
value: [0.88220864 0.89070962 0.89327731 0.88151261 0.88907563 0.88823529
0.88739496 0.88067227 0.88655462 0.8907563 ]
mean value: 0.8870397249809014
key: test_jcc
value: [0.76923077 0.73493976 0.64556962 0.775 0.74683544 0.70238095
0.73809524 0.73417722 0.725 0.73170732]
mean value: 0.7302936314297288
key: train_jcc
value: [0.79913917 0.81186686 0.81540698 0.7965368 0.8089725 0.80640466
0.80607815 0.79685265 0.80519481 0.81104651]
mean value: 0.8057499073390894
MCC on Blind test: 0.42
Accuracy on Blind test: 0.76
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.09179759 0.09042072 0.09019899 0.09118199 0.08997703 0.08817911
0.09113765 0.08191276 0.09158206 0.09061122]
mean value: 0.08969991207122803
key: score_time
value: [0.03039932 0.03026342 0.02985764 0.03125 0.03003645 0.0298965
0.03040457 0.02954245 0.03102684 0.0316658 ]
mean value: 0.030434298515319824
key: test_mcc
value: [0.71848125 0.73440686 0.70214689 0.74420841 0.73960026 0.73960026
0.80123362 0.77711043 0.83806027 0.73267501]
mean value: 0.7527523272092836
key: train_mcc
value: [0.7802016 0.7983579 0.79206602 0.78188958 0.7932249 0.80187807
0.7875319 0.79470566 0.78980614 0.79509686]
mean value: 0.7914758627965253
key: test_accuracy
value: [0.85714286 0.86466165 0.84848485 0.86363636 0.86363636 0.86363636
0.89393939 0.88636364 0.91666667 0.86363636]
mean value: 0.8721804511278195
key: train_accuracy
value: [0.88561817 0.89486964 0.89243697 0.88739496 0.89327731 0.89747899
0.88991597 0.89327731 0.8907563 0.89411765]
mean value: 0.8919143267062923
key: test_fscore
value: [0.86330935 0.87323944 0.85714286 0.87671233 0.875 0.875
0.90277778 0.89208633 0.92086331 0.87142857]
mean value: 0.8807559964541803
key: train_fscore
value: [0.89375 0.90196078 0.8992126 0.89448819 0.89976322 0.90378549
0.89709348 0.90039216 0.89811912 0.90063091]
mean value: 0.8989195954794376
key: test_precision
value: [0.82191781 0.82666667 0.81081081 0.8 0.80769231 0.80769231
0.83333333 0.84931507 0.87671233 0.82432432]
mean value: 0.8258464955999203
key: train_precision
value: [0.8350365 0.84434655 0.84592593 0.84148148 0.84821429 0.85141159
0.84218289 0.84411765 0.84140969 0.84843982]
mean value: 0.8442566379798555
key: test_recall
value: [0.90909091 0.92537313 0.90909091 0.96969697 0.95454545 0.95454545
0.98484848 0.93939394 0.96969697 0.92424242]
mean value: 0.9440524649479873
key: train_recall
value: [0.96134454 0.96801347 0.95966387 0.95462185 0.95798319 0.96302521
0.95966387 0.96470588 0.96302521 0.95966387]
mean value: 0.9611710947005065
key: test_roc_auc
value: [0.85753053 0.86420172 0.84848485 0.86363636 0.86363636 0.86363636
0.89393939 0.88636364 0.91666667 0.86363636]
mean value: 0.8721732247851651
key: train_roc_auc
value: [0.88555442 0.8949311 0.89243697 0.88739496 0.89327731 0.89747899
0.88991597 0.89327731 0.8907563 0.89411765]
mean value: 0.8919140989729225
key: test_jcc
value: [0.75949367 0.775 0.75 0.7804878 0.77777778 0.77777778
0.82278481 0.80519481 0.85333333 0.7721519 ]
mean value: 0.7874001878708579
key: train_jcc
value: [0.8079096 0.82142857 0.81688126 0.80911681 0.81779053 0.82446043
0.81339031 0.81883024 0.81507824 0.81922525]
mean value: 0.8164111249615581
MCC on Blind test: 0.65
Accuracy on Blind test: 0.85
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [ 7.79554605 11.49424958 7.54219484 5.43912721 6.24815321 11.02255273
10.90592504 5.49640512 4.84995794 5.80157137]
mean value: 7.6595683097839355
key: score_time
value: [0.01336813 0.02104759 0.02115202 0.01333594 0.01317334 0.01572514
0.0206151 0.01981592 0.01315188 0.01340199]
mean value: 0.016478705406188964
key: test_mcc
value: [0.79079986 0.76065472 0.72861209 0.85004744 0.77352678 0.73267501
0.78932976 0.7800135 0.81855773 0.73029674]
mean value: 0.7754513633073671
key: train_mcc
value: [0.93970666 0.97816762 0.97821896 0.97649265 0.98319883 0.97482434
0.97816369 0.89872756 0.95975062 0.97502267]
mean value: 0.964227359943727
key: test_accuracy
value: [0.89473684 0.87969925 0.86363636 0.92424242 0.88636364 0.86363636
0.89393939 0.88636364 0.90909091 0.86363636]
mean value: 0.8865345181134655
key: train_accuracy
value: [0.96972246 0.98906644 0.98907563 0.98823529 0.99159664 0.98739496
0.98907563 0.94789916 0.97983193 0.98739496]
mean value: 0.9819293099914482
key: test_fscore
value: [0.890625 0.88405797 0.859375 0.92647059 0.88888889 0.87142857
0.89705882 0.89361702 0.90769231 0.86956522]
mean value: 0.8888779389456867
key: train_fscore
value: [0.96938776 0.9891031 0.98901099 0.98827471 0.99161074 0.9874477
0.98904802 0.94991922 0.97969543 0.98753117]
mean value: 0.9821028837722166
key: test_precision
value: [0.91935484 0.85915493 0.88709677 0.9 0.86956522 0.82432432
0.87142857 0.84 0.921875 0.83333333]
mean value: 0.8726132988958224
key: train_precision
value: [0.98106713 0.98497496 0.99489796 0.98497496 0.98994975 0.98333333
0.99155405 0.91446345 0.98637138 0.97697368]
mean value: 0.9788560654162172
key: test_recall
value: [0.86363636 0.91044776 0.83333333 0.95454545 0.90909091 0.92424242
0.92424242 0.95454545 0.89393939 0.90909091]
mean value: 0.9077114427860696
key: train_recall
value: [0.95798319 0.99326599 0.98319328 0.99159664 0.99327731 0.99159664
0.98655462 0.98823529 0.97310924 0.99831933]
mean value: 0.985713153948448
key: test_roc_auc
value: [0.89450475 0.8794663 0.86363636 0.92424242 0.88636364 0.86363636
0.89393939 0.88636364 0.90909091 0.86363636]
mean value: 0.8864880144730891
key: train_roc_auc
value: [0.96973234 0.98906997 0.98907563 0.98823529 0.99159664 0.98739496
0.98907563 0.94789916 0.97983193 0.98739496]
mean value: 0.9819306510482981
key: test_jcc
value: [0.8028169 0.79220779 0.75342466 0.8630137 0.8 0.7721519
0.81333333 0.80769231 0.83098592 0.76923077]
mean value: 0.8004857274264172
key: train_jcc
value: [0.94059406 0.97844113 0.97826087 0.97682119 0.98336106 0.97520661
0.97833333 0.90461538 0.960199 0.97536946]
mean value: 0.9651202106233013
MCC on Blind test: 0.63
Accuracy on Blind test: 0.86
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.08770657 0.09637952 0.08765864 0.0985477 0.07765341 0.06359506
0.08068061 0.08904219 0.062922 0.09313583]
mean value: 0.08373215198516845
key: score_time
value: [0.01011205 0.00990009 0.00997949 0.00955868 0.00991607 0.00972962
0.00970531 0.00996614 0.00964093 0.0096693 ]
mean value: 0.009817767143249511
key: test_mcc
value: [0.81953867 0.70036445 0.86452993 0.81818182 0.84848485 0.80386117
0.84848485 0.80386117 0.89486432 0.77281598]
mean value: 0.8174987205049051
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90977444 0.84962406 0.93181818 0.90909091 0.92424242 0.90151515
0.92424242 0.90151515 0.9469697 0.88636364]
mean value: 0.9085156071998177
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90909091 0.85507246 0.93333333 0.90909091 0.92424242 0.9037037
0.92424242 0.9037037 0.94573643 0.88549618]
mean value: 0.9093712488490158
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90909091 0.83098592 0.91304348 0.90909091 0.92424242 0.88405797
0.92424242 0.88405797 0.96825397 0.89230769]
mean value: 0.903937366301114
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.88059701 0.95454545 0.90909091 0.92424242 0.92424242
0.92424242 0.92424242 0.92424242 0.87878788]
mean value: 0.9153324287652645
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90976934 0.84938942 0.93181818 0.90909091 0.92424242 0.90151515
0.92424242 0.90151515 0.9469697 0.88636364]
mean value: 0.908491632745364
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.83333333 0.74683544 0.875 0.83333333 0.85915493 0.82432432
0.85915493 0.82432432 0.89705882 0.79452055]
mean value: 0.8347039988982837
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.71
Accuracy on Blind test: 0.89
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.21911669 0.22973084 0.22298813 0.2171011 0.21578741 0.20552397
0.21279311 0.21100545 0.22396517 0.20665073]
mean value: 0.21646625995635987
key: score_time
value: [0.02211118 0.02221394 0.0213306 0.02232122 0.02003217 0.02073336
0.02134371 0.02109003 0.02177501 0.02143049]
mean value: 0.021438169479370116
key: test_mcc
value: [0.8046133 0.74631701 0.72760688 0.85004744 0.83419555 0.74456392
0.82425939 0.85201287 0.90909091 0.77352678]
mean value: 0.8066234039020962
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90225564 0.87218045 0.86363636 0.92424242 0.91666667 0.87121212
0.90909091 0.92424242 0.95454545 0.88636364]
mean value: 0.9024436090225564
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90225564 0.87769784 0.86153846 0.92647059 0.91851852 0.87591241
0.91428571 0.92753623 0.95454545 0.88888889]
mean value: 0.9047649747479877
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.89552239 0.84722222 0.875 0.9 0.89855072 0.84507042
0.86486486 0.88888889 0.95454545 0.86956522]
mean value: 0.8839230183145329
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.90909091 0.91044776 0.84848485 0.95454545 0.93939394 0.90909091
0.96969697 0.96969697 0.95454545 0.90909091]
mean value: 0.9274084124830394
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.90230665 0.87189055 0.86363636 0.92424242 0.91666667 0.87121212
0.90909091 0.92424242 0.95454545 0.88636364]
mean value: 0.9024197195838988
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.82191781 0.78205128 0.75675676 0.8630137 0.84931507 0.77922078
0.84210526 0.86486486 0.91304348 0.8 ]
mean value: 0.8272288999654913
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.59
Accuracy on Blind test: 0.84
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01580811 0.01548338 0.01368093 0.01516891 0.0141232 0.01596141
0.01456952 0.01595473 0.01479793 0.01596093]
mean value: 0.015150904655456543
key: score_time
value: [0.01032925 0.01032257 0.0101037 0.00945139 0.01024938 0.00988722
0.01068926 0.0100286 0.01042461 0.01069736]
mean value: 0.010218334197998048
key: test_mcc
value: [0.50384441 0.64007417 0.53085171 0.66697297 0.63900965 0.63753558
0.65521076 0.69825325 0.5992912 0.56067042]
mean value: 0.6131714128542862
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.7518797 0.81954887 0.76515152 0.83333333 0.81818182 0.81818182
0.82575758 0.84848485 0.79545455 0.78030303]
mean value: 0.8056277056277057
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.7518797 0.82608696 0.75968992 0.83076923 0.82608696 0.82352941
0.83453237 0.85294118 0.81118881 0.77862595]
mean value: 0.8095330493264747
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.74626866 0.8028169 0.77777778 0.84375 0.79166667 0.8
0.79452055 0.82857143 0.75324675 0.78461538]
mean value: 0.7923234116948085
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75757576 0.85074627 0.74242424 0.81818182 0.86363636 0.84848485
0.87878788 0.87878788 0.87878788 0.77272727]
mean value: 0.8290140208050656
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.75192221 0.81931253 0.76515152 0.83333333 0.81818182 0.81818182
0.82575758 0.84848485 0.79545455 0.78030303]
mean value: 0.8056083220262324
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.60240964 0.7037037 0.6125 0.71052632 0.7037037 0.7
0.71604938 0.74358974 0.68235294 0.6375 ]
mean value: 0.6812335429233362
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.34
Accuracy on Blind test: 0.74
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [3.97470117 3.93355489 6.33849072 7.47062922 6.05037212 5.72808099
4.41282606 5.67960405 6.03741074 6.04518557]
mean value: 5.567085552215576
key: score_time
value: [0.11290765 0.10394311 0.25654578 0.17098546 0.14105511 0.14017177
0.1155014 0.14093661 0.14221764 0.17594004]
mean value: 0.15002045631408692
key: test_mcc
value: [0.94028503 0.80667588 0.83573501 0.89486432 0.9251987 0.7431924
0.91076511 0.89404202 0.93939394 0.88040627]
mean value: 0.8770558685927923
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96992481 0.90225564 0.91666667 0.9469697 0.96212121 0.87121212
0.95454545 0.9469697 0.96969697 0.93939394]
mean value: 0.9379756208703577
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97014925 0.90647482 0.91338583 0.94814815 0.96296296 0.87407407
0.95588235 0.94736842 0.96969697 0.94117647]
mean value: 0.938931930011108
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95588235 0.875 0.95081967 0.92753623 0.94202899 0.85507246
0.92857143 0.94029851 0.96969697 0.91428571]
mean value: 0.9259192326248543
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98484848 0.94029851 0.87878788 0.96969697 0.98484848 0.89393939
0.98484848 0.95454545 0.96969697 0.96969697]
mean value: 0.9531207598371778
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97003618 0.90196744 0.91666667 0.9469697 0.96212121 0.87121212
0.95454545 0.9469697 0.96969697 0.93939394]
mean value: 0.9379579375848033
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94202899 0.82894737 0.84057971 0.90140845 0.92857143 0.77631579
0.91549296 0.9 0.94117647 0.88888889]
mean value: 0.8863410050046168
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [2.18256736 3.10948396 3.41758657 2.72969723 2.87032175 2.7995379
2.65826178 3.18488312 2.70599651 3.31159854]
mean value: 2.8969934701919557
key: score_time
value: [0.23093939 0.23506045 0.21551728 0.20308757 0.21437836 0.17920065
0.23916888 0.19888854 0.24526668 0.1810782 ]
mean value: 0.21425859928131102
key: test_mcc
value: [0.91145917 0.77448543 0.83419555 0.89486432 0.92690611 0.78932976
0.89486432 0.84848485 0.92434853 0.86452993]
mean value: 0.8663467976628745
key: train_mcc
value: [0.94135097 0.95144787 0.94820052 0.94483811 0.94492358 0.94971397
0.94648053 0.94304896 0.94635215 0.94812549]
mean value: 0.9464482143823427
key: test_accuracy
value: [0.95488722 0.88721805 0.91666667 0.9469697 0.96212121 0.89393939
0.9469697 0.92424242 0.96212121 0.93181818]
mean value: 0.9326953748006379
key: train_accuracy
value: [0.9705635 0.97560976 0.97394958 0.97226891 0.97226891 0.97478992
0.97310924 0.97142857 0.97310924 0.97394958]
mean value: 0.9731047204415828
key: test_fscore
value: [0.95588235 0.88888889 0.91472868 0.94814815 0.96350365 0.89705882
0.94814815 0.92424242 0.96240602 0.93333333]
mean value: 0.9336340466074704
key: train_fscore
value: [0.97090607 0.97585346 0.97427386 0.97261411 0.97265949 0.975
0.97342193 0.97171381 0.97333333 0.97423109]
mean value: 0.9734007136255515
key: test_precision
value: [0.92857143 0.88235294 0.93650794 0.92753623 0.92957746 0.87142857
0.92753623 0.92424242 0.95522388 0.91304348]
mean value: 0.9196020589341564
key: train_precision
value: [0.96052632 0.96540362 0.96229508 0.96065574 0.95915033 0.96694215
0.96223317 0.96210873 0.96528926 0.96381579]
mean value: 0.9628420181669508
key: test_recall
value: [0.98484848 0.89552239 0.89393939 0.96969697 1. 0.92424242
0.96969697 0.92424242 0.96969697 0.95454545]
mean value: 0.9486431478968792
key: train_recall
value: [0.98151261 0.98653199 0.98655462 0.98487395 0.98655462 0.98319328
0.98487395 0.98151261 0.98151261 0.98487395]
mean value: 0.9841994171405937
key: test_roc_auc
value: [0.95511081 0.88715513 0.91666667 0.9469697 0.96212121 0.89393939
0.9469697 0.92424242 0.96212121 0.93181818]
mean value: 0.9327114427860697
key: train_roc_auc
value: [0.97055428 0.97561893 0.97394958 0.97226891 0.97226891 0.97478992
0.97310924 0.97142857 0.97310924 0.97394958]
mean value: 0.9731047166341285
key: test_jcc
value: [0.91549296 0.8 0.84285714 0.90140845 0.92957746 0.81333333
0.90140845 0.85915493 0.92753623 0.875 ]
mean value: 0.8765768961595661
key: train_jcc
value: [0.94345719 0.95284553 0.94983819 0.94668821 0.94677419 0.95121951
0.94822006 0.94498382 0.94805195 0.94975689]
mean value: 0.9481835537416387
MCC on Blind test: 0.79
Accuracy on Blind test: 0.92
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.03519893 0.04723525 0.04286456 0.04674935 0.04282951 0.04792905
0.0430634 0.04757333 0.04246354 0.04807854]
mean value: 0.04439854621887207
key: score_time
value: [0.02588463 0.02797842 0.02426362 0.02864552 0.02428579 0.02721524
0.02426314 0.02780771 0.02645445 0.02699757]
mean value: 0.026379609107971193
key: test_mcc
value: [0.68631183 0.74631701 0.63636364 0.61313934 0.54950626 0.70511024
0.73267501 0.71220297 0.68568568 0.5611861 ]
mean value: 0.6628498064589106
key: train_mcc
value: [0.67197059 0.68369068 0.67740153 0.66050973 0.68212887 0.67495193
0.67082886 0.65736006 0.68191117 0.66546459]
mean value: 0.6726218014876005
key: test_accuracy
value: [0.84210526 0.87218045 0.81818182 0.8030303 0.77272727 0.84848485
0.86363636 0.85606061 0.84090909 0.78030303]
mean value: 0.8297619047619047
key: train_accuracy
value: [0.83515559 0.84020185 0.83781513 0.82857143 0.8394958 0.83613445
0.83445378 0.82689076 0.8394958 0.83109244]
mean value: 0.8349307023061537
key: test_fscore
value: [0.84671533 0.87769784 0.81818182 0.81690141 0.78571429 0.85915493
0.87142857 0.85714286 0.84892086 0.7751938 ]
mean value: 0.8357051702448439
key: train_fscore
value: [0.84090909 0.84751204 0.84347121 0.8368 0.8468324 0.84312148
0.84048583 0.83546326 0.84658635 0.83907126]
mean value: 0.8420252907043897
key: test_precision
value: [0.81690141 0.84722222 0.81818182 0.76315789 0.74324324 0.80263158
0.82432432 0.85074627 0.80821918 0.79365079]
mean value: 0.8068278730496224
key: train_precision
value: [0.81318681 0.80981595 0.81504702 0.79847328 0.80981595 0.80864198
0.8109375 0.79604262 0.81076923 0.80122324]
mean value: 0.8073953585042138
key: test_recall
value: [0.87878788 0.91044776 0.81818182 0.87878788 0.83333333 0.92424242
0.92424242 0.86363636 0.89393939 0.75757576]
mean value: 0.8683175033921302
key: train_recall
value: [0.87058824 0.88888889 0.87394958 0.8789916 0.88739496 0.88067227
0.87226891 0.8789916 0.88571429 0.88067227]
mean value: 0.879813258636788
key: test_roc_auc
value: [0.84237901 0.87189055 0.81818182 0.8030303 0.77272727 0.84848485
0.86363636 0.85606061 0.84090909 0.78030303]
mean value: 0.8297602894617819
key: train_roc_auc
value: [0.83512577 0.84024276 0.83781513 0.82857143 0.8394958 0.83613445
0.83445378 0.82689076 0.8394958 0.83109244]
mean value: 0.8349318111082817
key: test_jcc
value: [0.73417722 0.78205128 0.69230769 0.69047619 0.64705882 0.75308642
0.7721519 0.75 0.7375 0.63291139]
mean value: 0.7191720914446778
key: train_jcc
value: [0.7254902 0.73537604 0.72931276 0.71939477 0.73435327 0.72878999
0.72486034 0.71742112 0.73398329 0.72275862]
mean value: 0.7271740398801881
MCC on Blind test: 0.59
Accuracy on Blind test: 0.83
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [3.90365171 2.85308623 2.88379312 2.87783575 2.93418789 2.8434577
2.931072 2.88599873 2.94633913 2.89332843]
mean value: 2.995275068283081
key: score_time
value: [0.01317334 0.01315236 0.01351786 0.013798 0.01476145 0.0138061
0.01398635 0.01382971 0.01496601 0.01291895]
mean value: 0.01379101276397705
key: test_mcc
value: [0.91145917 0.80667588 0.84848485 0.88040627 0.93982555 0.81855773
0.9701425 0.89486432 0.87919164 0.87919164]
mean value: 0.8828799556470258
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95488722 0.90225564 0.92424242 0.93939394 0.96969697 0.90909091
0.98484848 0.9469697 0.93939394 0.93939394]
mean value: 0.9410173160173161
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95588235 0.90647482 0.92424242 0.94117647 0.97014925 0.91044776
0.98507463 0.94814815 0.93846154 0.94029851]
mean value: 0.9420355903779138
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92857143 0.875 0.92424242 0.91428571 0.95588235 0.89705882
0.97058824 0.92753623 0.953125 0.92647059]
mean value: 0.9272760798983625
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98484848 0.94029851 0.92424242 0.96969697 0.98484848 0.92424242
1. 0.96969697 0.92424242 0.95454545]
mean value: 0.9576662143826323
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95511081 0.90196744 0.92424242 0.93939394 0.96969697 0.90909091
0.98484848 0.9469697 0.93939394 0.93939394]
mean value: 0.941010854816825
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.91549296 0.82894737 0.85915493 0.88888889 0.94202899 0.83561644
0.97058824 0.90140845 0.88405797 0.88732394]
mean value: 0.8913508169172104
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.77
Accuracy on Blind test: 0.91
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.08217144 0.09075713 0.08338308 0.113235 0.12136722 0.08390951
0.07351661 0.0989933 0.09131002 0.10390306]
mean value: 0.09425463676452636
key: score_time
value: [0.01993656 0.02662683 0.01330376 0.01766443 0.02078724 0.01329708
0.01319885 0.01317883 0.02058554 0.02135062]
mean value: 0.01799297332763672
key: test_mcc
value: [0.7515073 0.72930801 0.73267501 0.74942473 0.75725927 0.77711043
0.80386117 0.78824078 0.80534465 0.76072577]
mean value: 0.7655457131180822
key: train_mcc
value: [0.83607958 0.84513414 0.8445445 0.8344302 0.8395983 0.84233833
0.83064344 0.8395983 0.83064344 0.84095802]
mean value: 0.8383968239634668
key: test_accuracy
value: [0.87218045 0.86466165 0.86363636 0.87121212 0.87121212 0.88636364
0.90151515 0.89393939 0.90151515 0.87878788]
mean value: 0.8805023923444976
key: train_accuracy
value: [0.91673675 0.92094197 0.9210084 0.91596639 0.91848739 0.92016807
0.91428571 0.91848739 0.91428571 0.91932773]
mean value: 0.9179695528337491
key: test_fscore
value: [0.87943262 0.86567164 0.87142857 0.87943262 0.88275862 0.89208633
0.9037037 0.89552239 0.90510949 0.88405797]
mean value: 0.8859203964900466
key: train_fscore
value: [0.91996766 0.92419355 0.92394822 0.91909385 0.92158448 0.92282697
0.91720779 0.92158448 0.91720779 0.92220421]
mean value: 0.9209819008738551
key: test_precision
value: [0.82666667 0.86567164 0.82432432 0.82666667 0.81012658 0.84931507
0.88405797 0.88235294 0.87323944 0.84722222]
mean value: 0.8489643521253238
key: train_precision
value: [0.88629283 0.8869969 0.89079563 0.88611544 0.88785047 0.89308176
0.88697017 0.88785047 0.88697017 0.89045383]
mean value: 0.8883377690429243
key: test_recall
value: [0.93939394 0.86567164 0.92424242 0.93939394 0.96969697 0.93939394
0.92424242 0.90909091 0.93939394 0.92424242]
mean value: 0.9274762550881954
key: train_recall
value: [0.95630252 0.96464646 0.95966387 0.95462185 0.95798319 0.95462185
0.94957983 0.95798319 0.94957983 0.95630252]
mean value: 0.956128512010865
key: test_roc_auc
value: [0.87268204 0.864654 0.86363636 0.87121212 0.87121212 0.88636364
0.90151515 0.89393939 0.90151515 0.87878788]
mean value: 0.8805517865219358
key: train_roc_auc
value: [0.91670345 0.92097869 0.9210084 0.91596639 0.91848739 0.92016807
0.91428571 0.91848739 0.91428571 0.91932773]
mean value: 0.9179698950287186
key: test_jcc
value: [0.78481013 0.76315789 0.7721519 0.78481013 0.79012346 0.80519481
0.82432432 0.81081081 0.82666667 0.79220779]
mean value: 0.7954257902630099
key: train_jcc
value: [0.85179641 0.85907046 0.85864662 0.8502994 0.85457271 0.85671192
0.84707646 0.85457271 0.84707646 0.8556391 ]
mean value: 0.8535462253796596
MCC on Blind test: 0.63
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01841187 0.04239488 0.04338026 0.018224 0.0188179 0.01878238
0.01850367 0.01893735 0.01868582 0.01888943]
mean value: 0.023502755165100097
key: score_time
value: [0.02268887 0.02586532 0.02344608 0.01426363 0.01349759 0.01352072
0.01398182 0.01327801 0.0132823 0.01402259]
mean value: 0.01678469181060791
key: test_mcc
value: [0.5188958 0.66938237 0.56222174 0.54570516 0.59097693 0.72760688
0.65765844 0.63753558 0.73029674 0.62128344]
mean value: 0.6261563059343562
key: train_mcc
value: [0.63347346 0.65058239 0.62701829 0.63876463 0.63370295 0.65287677
0.61513387 0.60001356 0.64604365 0.61848827]
mean value: 0.6316097837906691
key: test_accuracy
value: [0.7593985 0.83458647 0.78030303 0.77272727 0.79545455 0.86363636
0.82575758 0.81818182 0.86363636 0.81060606]
mean value: 0.8124287992709045
key: train_accuracy
value: [0.81665265 0.82506308 0.81344538 0.81932773 0.81680672 0.82605042
0.80756303 0.8 0.82268908 0.8092437 ]
mean value: 0.815684177792227
key: test_fscore
value: [0.75384615 0.83823529 0.77165354 0.7761194 0.79389313 0.86567164
0.83687943 0.8125 0.86956522 0.81203008]
mean value: 0.8130393891021387
key: train_fscore
value: [0.81893688 0.82809917 0.81530782 0.82098251 0.81833333 0.83018868
0.80804694 0.80067002 0.82662284 0.80940386]
mean value: 0.8176592060673311
key: test_precision
value: [0.765625 0.82608696 0.80327869 0.76470588 0.8 0.85294118
0.78666667 0.83870968 0.83333333 0.80597015]
mean value: 0.8077317530542945
key: train_precision
value: [0.80952381 0.81331169 0.80724876 0.81353135 0.81157025 0.81089744
0.80602007 0.79799666 0.80868167 0.80872483]
mean value: 0.8087506531449246
key: test_recall
value: [0.74242424 0.85074627 0.74242424 0.78787879 0.78787879 0.87878788
0.89393939 0.78787879 0.90909091 0.81818182]
mean value: 0.8199231117141564
key: train_recall
value: [0.82857143 0.84343434 0.82352941 0.82857143 0.82521008 0.85042017
0.81008403 0.80336134 0.84537815 0.81008403]
mean value: 0.8268644427467957
key: test_roc_auc
value: [0.75927182 0.83446404 0.78030303 0.77272727 0.79545455 0.86363636
0.82575758 0.81818182 0.86363636 0.81060606]
mean value: 0.8124038896426956
key: train_roc_auc
value: [0.81664262 0.82507852 0.81344538 0.81932773 0.81680672 0.82605042
0.80756303 0.8 0.82268908 0.8092437 ]
mean value: 0.8156847183317772
key: test_jcc
value: [0.60493827 0.72151899 0.62820513 0.63414634 0.65822785 0.76315789
0.7195122 0.68421053 0.76923077 0.6835443 ]
mean value: 0.686669226591934
key: train_jcc
value: [0.69338959 0.70662906 0.68820225 0.69632768 0.69252468 0.70967742
0.67791842 0.66759777 0.70448179 0.67983075]
mean value: 0.6916579410309931
MCC on Blind test: 0.54
Accuracy on Blind test: 0.81
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03066039 0.04237843 0.0722537 0.05127764 0.03083706 0.04251885
0.03168774 0.0351162 0.02943254 0.03488851]
mean value: 0.04010510444641113
key: score_time
value: [0.01965117 0.01301241 0.01952147 0.01264167 0.01278877 0.01377964
0.01281571 0.01285934 0.0128963 0.012851 ]
mean value: 0.014281749725341797
key: test_mcc
value: [0.71848125 0.68595814 0.68723073 0.81442137 0.69293487 0.71319972
0.77521709 0.73576721 0.86452993 0.36155076]
mean value: 0.7049291083404625
key: train_mcc
value: [0.79940017 0.81798627 0.74027929 0.8033942 0.76201646 0.80154842
0.77356593 0.78546966 0.79651203 0.54421959]
mean value: 0.7624392036940087
key: test_accuracy
value: [0.85714286 0.84210526 0.82575758 0.90151515 0.83333333 0.84848485
0.87878788 0.86363636 0.93181818 0.65151515]
mean value: 0.8434096605149237
key: train_accuracy
value: [0.89823381 0.9058032 0.85882353 0.89747899 0.87394958 0.89495798
0.87815126 0.88571429 0.89579832 0.73445378]
mean value: 0.8723364736979737
key: test_fscore
value: [0.86330935 0.84892086 0.8496732 0.90909091 0.85333333 0.8630137
0.89041096 0.87323944 0.93333333 0.52083333]
mean value: 0.8405158421686592
key: train_fscore
value: [0.90249799 0.91125198 0.8742515 0.90438871 0.88496933 0.90317583
0.88973384 0.89554531 0.90127389 0.64414414]
mean value: 0.8711232520757678
key: test_precision
value: [0.82191781 0.81944444 0.74712644 0.84415584 0.76190476 0.7875
0.8125 0.81578947 0.91304348 0.83333333]
mean value: 0.8156715580784251
key: train_precision
value: [0.86687307 0.86077844 0.78812416 0.84728341 0.81382228 0.83764368
0.8125 0.82461103 0.85627837 0.97610922]
mean value: 0.8484023648159316
key: test_recall
value: [0.90909091 0.88059701 0.98484848 0.98484848 0.96969697 0.95454545
0.98484848 0.93939394 0.95454545 0.37878788]
mean value: 0.8941203075531434
key: train_recall
value: [0.94117647 0.96801347 0.98151261 0.9697479 0.9697479 0.97983193
0.98319328 0.97983193 0.9512605 0.48067227]
mean value: 0.9204988257929434
key: test_roc_auc
value: [0.85753053 0.84181366 0.82575758 0.90151515 0.83333333 0.84848485
0.87878788 0.86363636 0.93181818 0.65151515]
mean value: 0.8434192672998643
key: train_roc_auc
value: [0.89819766 0.90585547 0.85882353 0.89747899 0.87394958 0.89495798
0.87815126 0.88571429 0.89579832 0.73445378]
mean value: 0.8723380867498515
key: test_jcc
value: [0.75949367 0.7375 0.73863636 0.83333333 0.74418605 0.75903614
0.80246914 0.775 0.875 0.35211268]
mean value: 0.7376767370804521
key: train_jcc
value: [0.82232012 0.83697234 0.77659574 0.82546495 0.79367263 0.82344633
0.80136986 0.8108484 0.82028986 0.47508306]
mean value: 0.778606328564591
MCC on Blind test: 0.48
Accuracy on Blind test: 0.81
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03250432 0.04610515 0.04896641 0.05026364 0.07544041 0.0446434
0.05205488 0.07194495 0.04098344 0.04652524]
mean value: 0.050943183898925784
key: score_time
value: [0.01292706 0.01292372 0.01289201 0.01292682 0.01127601 0.01288795
0.01278996 0.01134944 0.01288676 0.0128274 ]
mean value: 0.01256871223449707
key: test_mcc
value: [0.42640275 0.76265475 0.70878358 0.83573501 0.62994079 0.67426617
0.70511024 0.76072577 0.78816781 0.4747723 ]
mean value: 0.6766559176006244
key: train_mcc
value: [0.46894069 0.82260467 0.80301916 0.84706002 0.68505868 0.76682222
0.77055923 0.82251813 0.73391183 0.68531986]
mean value: 0.7405814492275309
key: test_accuracy
value: [0.68421053 0.87969925 0.84848485 0.91666667 0.8030303 0.82575758
0.84848485 0.87878788 0.88636364 0.71969697]
mean value: 0.8291182501708818
key: train_accuracy
value: [0.68797309 0.91084945 0.89579832 0.92352941 0.82773109 0.87310924
0.88151261 0.91092437 0.85378151 0.82605042]
mean value: 0.8591259514739453
key: test_fscore
value: [0.57142857 0.875 0.86111111 0.91970803 0.77192982 0.84563758
0.83606557 0.87301587 0.89655172 0.65420561]
mean value: 0.8104653898591715
key: train_fscore
value: [0.55568862 0.90862069 0.90387597 0.9234651 0.79842675 0.8862095
0.87262873 0.90909091 0.87091988 0.79443893]
mean value: 0.8423365062744236
key: test_precision
value: [0.875 0.91803279 0.79487179 0.88732394 0.91666667 0.75903614
0.91071429 0.91666667 0.82278481 0.85365854]
mean value: 0.8654755635756893
key: train_precision
value: [0.96666667 0.93109541 0.83884892 0.92424242 0.96208531 0.80327869
0.94335938 0.92819615 0.77954847 0.97087379]
mean value: 0.9048195196007951
key: test_recall
value: [0.42424242 0.8358209 0.93939394 0.95454545 0.66666667 0.95454545
0.77272727 0.83333333 0.98484848 0.53030303]
mean value: 0.7896426956128448
key: train_recall
value: [0.38991597 0.88720539 0.97983193 0.92268908 0.68235294 0.98823529
0.81176471 0.8907563 0.98655462 0.67226891]
mean value: 0.8211575135104547
key: test_roc_auc
value: [0.68227047 0.88003166 0.84848485 0.91666667 0.8030303 0.82575758
0.84848485 0.87878788 0.88636364 0.71969697]
mean value: 0.8289574853007688
key: train_roc_auc
value: [0.68822398 0.91082958 0.89579832 0.92352941 0.82773109 0.87310924
0.88151261 0.91092437 0.85378151 0.82605042]
mean value: 0.8591490535608183
key: test_jcc
value: [0.4 0.77777778 0.75609756 0.85135135 0.62857143 0.73255814
0.71830986 0.77464789 0.8125 0.48611111]
mean value: 0.6937925115801036
key: train_jcc
value: [0.38474295 0.83254344 0.82461103 0.8578125 0.66448445 0.79566982
0.77403846 0.83333333 0.77135348 0.65897858]
mean value: 0.739756806448993
MCC on Blind test: 0.65
Accuracy on Blind test: 0.84
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.43816185 0.42773724 0.43033075 0.41837001 0.42521048 0.42237544
0.42985082 0.42985892 0.45174265 0.4252367 ]
mean value: 0.4298874855041504
key: score_time
value: [0.01665282 0.01664472 0.0165534 0.01650286 0.01656866 0.01707387
0.01700234 0.01646709 0.0166471 0.01660824]
mean value: 0.01667211055755615
key: test_mcc
value: [0.86558065 0.78977162 0.86452993 0.8824419 0.86452993 0.78787879
0.92434853 0.833429 0.87919164 0.86452993]
mean value: 0.8556231936727914
key: train_mcc
value: [0.91951324 0.92281332 0.92138456 0.91780392 0.92795261 0.93618205
0.92974798 0.9363249 0.92130122 0.93156752]
mean value: 0.9264591303705949
key: test_accuracy
value: [0.93233083 0.89473684 0.93181818 0.93939394 0.93181818 0.89393939
0.96212121 0.91666667 0.93939394 0.93181818]
mean value: 0.9274037366142629
key: train_accuracy
value: [0.95962994 0.96131203 0.9605042 0.95882353 0.96386555 0.96806723
0.96470588 0.96806723 0.9605042 0.96554622]
mean value: 0.9631026001653815
key: test_fscore
value: [0.93333333 0.89705882 0.93023256 0.94202899 0.93333333 0.89393939
0.96183206 0.91729323 0.94029851 0.93333333]
mean value: 0.9282683562729682
key: train_fscore
value: [0.96013289 0.96166667 0.96106048 0.95920067 0.96425603 0.96822742
0.96517413 0.96838602 0.96099585 0.96608768]
mean value: 0.9635187834058504
key: test_precision
value: [0.91304348 0.88405797 0.95238095 0.90277778 0.91304348 0.89393939
0.96923077 0.91044776 0.92647059 0.91304348]
mean value: 0.9178435648555319
key: train_precision
value: [0.94909688 0.95214521 0.94771242 0.95049505 0.95394737 0.96339434
0.95253682 0.95881384 0.94918033 0.95114007]
mean value: 0.9528462330084465
key: test_recall
value: [0.95454545 0.91044776 0.90909091 0.98484848 0.95454545 0.89393939
0.95454545 0.92424242 0.95454545 0.95454545]
mean value: 0.9395296246042515
key: train_recall
value: [0.97142857 0.97138047 0.97478992 0.96806723 0.97478992 0.97310924
0.97815126 0.97815126 0.97310924 0.98151261]
mean value: 0.974448971507795
key: test_roc_auc
value: [0.93249661 0.89461782 0.93181818 0.93939394 0.93181818 0.89393939
0.96212121 0.91666667 0.93939394 0.93181818]
mean value: 0.9274084124830394
key: train_roc_auc
value: [0.95962001 0.96132049 0.9605042 0.95882353 0.96386555 0.96806723
0.96470588 0.96806723 0.9605042 0.96554622]
mean value: 0.9631024531024531
key: test_jcc
value: [0.875 0.81333333 0.86956522 0.89041096 0.875 0.80821918
0.92647059 0.84722222 0.88732394 0.875 ]
mean value: 0.8667545441830428
key: train_jcc
value: [0.92332268 0.92616372 0.92503987 0.9216 0.93097913 0.93841167
0.93269231 0.93870968 0.92492013 0.9344 ]
mean value: 0.929623919553232
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.24893618 0.26400042 0.26318049 0.26846218 0.28313684 0.26294708
0.2789216 0.26224089 0.26190519 0.27044439]
mean value: 0.2664175271987915
key: score_time
value: [0.01890635 0.03807998 0.04070425 0.04163957 0.03528833 0.02002311
0.04702449 0.03652 0.03795028 0.02937794]
mean value: 0.03455142974853516
key: test_mcc
value: [0.89484396 0.73050958 0.78932976 0.85004744 0.94112395 0.80312249
0.92434853 0.84848485 0.89404202 0.88040627]
mean value: 0.8556258862023336
key: train_mcc
value: [0.98822677 0.99327725 0.99832074 0.99160924 0.99159804 0.99328292
1. 0.99495939 0.98992156 0.98823669]
mean value: 0.9929432605655744
key: test_accuracy
value: [0.94736842 0.86466165 0.89393939 0.92424242 0.96969697 0.90151515
0.96212121 0.92424242 0.9469697 0.93939394]
mean value: 0.9274151287309182
key: train_accuracy
value: [0.9941127 0.99663583 0.99915966 0.99579832 0.99579832 0.99663866
1. 0.99747899 0.99495798 0.99411765]
mean value: 0.996469810800687
key: test_fscore
value: [0.94736842 0.86956522 0.890625 0.92647059 0.97058824 0.90076336
0.96240602 0.92424242 0.94656489 0.94117647]
mean value: 0.9279770616116411
key: train_fscore
value: [0.99412259 0.99662732 0.99916037 0.99580889 0.99580185 0.9966443
1. 0.99748111 0.99496644 0.99412259]
mean value: 0.9964735439198161
key: test_precision
value: [0.94029851 0.84507042 0.91935484 0.9 0.94285714 0.90769231
0.95522388 0.92424242 0.95384615 0.91428571]
mean value: 0.9202871392228333
key: train_precision
value: [0.99328859 0.99831081 0.99832215 0.99331104 0.99496644 0.99497487
1. 0.9966443 0.99329983 0.99328859]
mean value: 0.9956406621581875
key: test_recall
value: [0.95454545 0.89552239 0.86363636 0.95454545 1. 0.89393939
0.96969697 0.92424242 0.93939394 0.96969697]
mean value: 0.9365219357756671
key: train_recall
value: [0.99495798 0.99494949 1. 0.99831933 0.99663866 0.99831933
1. 0.99831933 0.99663866 0.99495798]
mean value: 0.9973100755453697
key: test_roc_auc
value: [0.94742198 0.86442786 0.89393939 0.92424242 0.96969697 0.90151515
0.96212121 0.92424242 0.9469697 0.93939394]
mean value: 0.92739710538218
key: train_roc_auc
value: [0.99411199 0.99663441 0.99915966 0.99579832 0.99579832 0.99663866
1. 0.99747899 0.99495798 0.99411765]
mean value: 0.9964695979401862
key: test_jcc
value: [0.9 0.76923077 0.8028169 0.8630137 0.94285714 0.81944444
0.92753623 0.85915493 0.89855072 0.88888889]
mean value: 0.8671493731559037
key: train_jcc
value: [0.98831386 0.99327731 0.99832215 0.99165275 0.9916388 0.99331104
1. 0.99497487 0.98998331 0.98831386]
mean value: 0.9929787938678081
MCC on Blind test: 0.69
Accuracy on Blind test: 0.88
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.91420841 0.84525871 0.94702196 0.84990621 1.60249829 0.84771895
1.92725825 0.90147996 1.74981856 0.9151895 ]
mean value: 1.1500358819961547
key: score_time
value: [0.05293345 0.03867912 0.0567255 0.05308461 0.05238461 0.05297732
0.05538249 0.05317044 0.05231237 0.06014085]
mean value: 0.05277907848358154
key: test_mcc
value: [0.72969918 0.74830832 0.63753558 0.77711043 0.71883597 0.72861209
0.73960026 0.82158384 0.75792383 0.72861209]
mean value: 0.7387821583904562
key: train_mcc
value: [0.93672036 0.94183394 0.95164901 0.94492358 0.9316729 0.94476334
0.93475087 0.94156086 0.94133735 0.93829364]
mean value: 0.9407505841032879
key: test_accuracy
value: [0.86466165 0.87218045 0.81818182 0.88636364 0.85606061 0.86363636
0.86363636 0.90909091 0.87878788 0.86363636]
mean value: 0.8676236044657097
key: train_accuracy
value: [0.96804037 0.9705635 0.97563025 0.97226891 0.96554622 0.97226891
0.96722689 0.97058824 0.97058824 0.96890756]
mean value: 0.9701629078881342
key: test_fscore
value: [0.86567164 0.87943262 0.82352941 0.89208633 0.86524823 0.86764706
0.875 0.91304348 0.88059701 0.86764706]
mean value: 0.8729902846388133
key: train_fscore
value: [0.96864686 0.97109827 0.97597349 0.97265949 0.96614368 0.97256858
0.96763485 0.97100249 0.97085762 0.9693962 ]
mean value: 0.9705981520486012
key: test_precision
value: [0.85294118 0.83783784 0.8 0.84931507 0.81333333 0.84285714
0.80769231 0.875 0.86764706 0.84285714]
mean value: 0.8389481068365033
key: train_precision
value: [0.95137763 0.95299838 0.9624183 0.95915033 0.94967532 0.96217105
0.9557377 0.95751634 0.9620462 0.95439739]
mean value: 0.9567488661268432
key: test_recall
value: [0.87878788 0.92537313 0.84848485 0.93939394 0.92424242 0.89393939
0.95454545 0.95454545 0.89393939 0.89393939]
mean value: 0.910719131614654
key: train_recall
value: [0.98655462 0.98989899 0.98991597 0.98655462 0.98319328 0.98319328
0.97983193 0.98487395 0.97983193 0.98487395]
mean value: 0.9848722519310755
key: test_roc_auc
value: [0.86476707 0.87177748 0.81818182 0.88636364 0.85606061 0.86363636
0.86363636 0.90909091 0.87878788 0.86363636]
mean value: 0.8675938489371325
key: train_roc_auc
value: [0.96802479 0.97057975 0.97563025 0.97226891 0.96554622 0.97226891
0.96722689 0.97058824 0.97058824 0.96890756]
mean value: 0.9701629742806214
key: test_jcc
value: [0.76315789 0.78481013 0.7 0.80519481 0.7625 0.76623377
0.77777778 0.84 0.78666667 0.76623377]
mean value: 0.7752574803425902
key: train_jcc
value: [0.9392 0.94382022 0.95307443 0.94677419 0.93450479 0.94660194
0.93729904 0.94363929 0.9433657 0.94060995]
mean value: 0.9428889560478227
MCC on Blind test: 0.55
Accuracy on Blind test: 0.82
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [2.26034164 2.15656042 2.12556386 2.0825882 2.16070485 2.17109656
2.21937275 2.30224919 2.25836587 2.26037884]
mean value: 2.1997222185134886
key: score_time
value: [0.01029849 0.01075649 0.00986791 0.01148391 0.01252151 0.01258969
0.01346898 0.01144934 0.01176357 0.0115099 ]
mean value: 0.011570978164672851
key: test_mcc
value: [0.88134139 0.77857748 0.81855773 0.83573501 0.92690611 0.8196886
0.9701425 0.833429 0.90950859 0.85004744]
mean value: 0.862393386471883
key: train_mcc
value: [0.9849739 0.98160977 0.98328216 0.98158054 0.97997035 0.97660853
0.97655886 0.99329976 0.98664381 0.98498664]
mean value: 0.9829514304345836
key: test_accuracy
value: [0.93984962 0.88721805 0.90909091 0.91666667 0.96212121 0.90909091
0.98484848 0.91666667 0.95454545 0.92424242]
mean value: 0.930434039644566
key: train_accuracy
value: [0.99243061 0.99074853 0.99159664 0.9907563 0.98991597 0.98823529
0.98823529 0.99663866 0.99327731 0.99243697]
mean value: 0.9914271579111039
key: test_fscore
value: [0.94117647 0.89361702 0.90769231 0.91970803 0.96350365 0.91176471
0.98507463 0.91729323 0.95522388 0.92647059]
mean value: 0.9321524513052296
key: train_fscore
value: [0.99249374 0.99081036 0.99165275 0.99081036 0.99 0.98833333
0.98831386 0.99664992 0.9933222 0.99249374]
mean value: 0.9914880272309861
key: test_precision
value: [0.91428571 0.85135135 0.921875 0.88732394 0.92957746 0.88571429
0.97058824 0.91044776 0.94117647 0.9 ]
mean value: 0.9112340226878438
key: train_precision
value: [0.98509934 0.98341625 0.98507463 0.98504983 0.98181818 0.98016529
0.98175788 0.9933222 0.986733 0.98509934]
mean value: 0.984753594200818
key: test_recall
value: [0.96969697 0.94029851 0.89393939 0.95454545 1. 0.93939394
1. 0.92424242 0.96969697 0.95454545]
mean value: 0.9546359113523293
key: train_recall
value: [1. 0.9983165 0.99831933 0.99663866 0.99831933 0.99663866
0.99495798 1. 1. 1. ]
mean value: 0.9983190447896331
key: test_roc_auc
value: [0.94007237 0.88681592 0.90909091 0.91666667 0.96212121 0.90909091
0.98484848 0.91666667 0.95454545 0.92424242]
mean value: 0.9304161013116237
key: train_roc_auc
value: [0.99242424 0.99075489 0.99159664 0.9907563 0.98991597 0.98823529
0.98823529 0.99663866 0.99327731 0.99243697]
mean value: 0.9914271567212743
key: test_jcc
value: [0.88888889 0.80769231 0.83098592 0.85135135 0.92957746 0.83783784
0.97058824 0.84722222 0.91428571 0.8630137 ]
mean value: 0.8741443636484267
key: train_jcc
value: [0.98509934 0.98178808 0.98344371 0.98178808 0.98019802 0.97693575
0.97689769 0.9933222 0.986733 0.98509934]
mean value: 0.9831305207536616
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.16132116 0.24572635 0.1239059 0.07294703 0.23971868 0.23406434
0.09096909 0.06589794 0.17431712 0.08550954]
mean value: 0.14943771362304686
key: score_time
value: [0.0302527 0.04182434 0.0523839 0.02149272 0.0493772 0.02764463
0.01641536 0.01400828 0.01939535 0.01256824]
mean value: 0.028536272048950196
key: test_mcc
value: [0.34042787 0.28728495 0.14547859 0.26352314 0.23664319 0.2466911
0.30151134 0.27050089 0.25400025 0.22882178]
mean value: 0.25748831072069617
key: train_mcc
value: [0.28121216 0.27899486 0.28775432 0.26665917 0.27204268 0.35622176
0.27382004 0.30618622 0.27910312 0.27558913]
mean value: 0.287758344848468
key: test_accuracy
value: [0.60150376 0.57894737 0.53030303 0.57575758 0.5530303 0.56818182
0.58333333 0.56818182 0.56060606 0.56818182]
mean value: 0.5688026885395306
key: train_accuracy
value: [0.57359125 0.57190917 0.57647059 0.56638655 0.56890756 0.61260504
0.5697479 0.58571429 0.57226891 0.57058824]
mean value: 0.5768189496151699
key: test_fscore
value: [0.71351351 0.70526316 0.67708333 0.69892473 0.69109948 0.69518717
0.70588235 0.6984127 0.69473684 0.69189189]
mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
0.6971995163490601
key: train_fscore
value: [0.70123748 0.70005893 0.70247934 0.6975381 0.69876688 0.72077529
0.69917744 0.70707071 0.70041201 0.69958848]
mean value: 0.7027104644570163
key: test_precision
value: [0.55462185 0.54471545 0.51587302 0.54166667 0.528 0.53719008
0.54545455 0.53658537 0.53225806 0.53781513]
mean value: 0.5374180162953031
key: train_precision
value: [0.5399274 0.53853128 0.54140127 0.53555356 0.53700361 0.56344697
0.53748871 0.546875 0.53894928 0.53797468]
mean value: 0.5417151759223713
key: test_recall
value: [1. 1. 0.98484848 0.98484848 1. 0.98484848
1. 1. 1. 0.96969697]
mean value: 0.9924242424242424
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.60447761 0.57575758 0.53030303 0.57575758 0.5530303 0.56818182
0.58333333 0.56818182 0.56060606 0.56818182]
mean value: 0.5687810945273631
key: train_roc_auc
value: [0.57323232 0.57226891 0.57647059 0.56638655 0.56890756 0.61260504
0.5697479 0.58571429 0.57226891 0.57058824]
mean value: 0.57681903064256
key: test_jcc
value: [0.55462185 0.54471545 0.51181102 0.53719008 0.528 0.53278689
0.54545455 0.53658537 0.53225806 0.52892562]
mean value: 0.5352348883065589
key: train_jcc
value: [0.5399274 0.53853128 0.54140127 0.53555356 0.53700361 0.56344697
0.53748871 0.546875 0.53894928 0.53797468]
mean value: 0.5417151759223713
MCC on Blind test: 0.11
Accuracy on Blind test: 0.38
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02625012 0.03711963 0.0367558 0.04246736 0.06679559 0.04406571
0.03983307 0.07703924 0.02355957 0.05142474]
mean value: 0.04453108310699463
key: score_time
value: [0.07219577 0.03721213 0.03111935 0.02411985 0.03120399 0.02946353
0.03227925 0.03698516 0.02328372 0.03299832]
mean value: 0.03508610725402832
key: test_mcc
value: [0.748702 0.70036445 0.78368849 0.7800135 0.74420841 0.79373126
0.84119102 0.76072577 0.87919164 0.73029674]
mean value: 0.7762113288716164
key: train_mcc
value: [0.81479237 0.81790967 0.81158 0.81564917 0.81538413 0.81558668
0.80973991 0.82018755 0.82058077 0.81282825]
mean value: 0.8154238490138148
key: test_accuracy
value: [0.87218045 0.84962406 0.88636364 0.88636364 0.86363636 0.89393939
0.91666667 0.87878788 0.93939394 0.86363636]
mean value: 0.8850592390066074
key: train_accuracy
value: [0.90496215 0.90664424 0.90336134 0.90588235 0.90588235 0.90504202
0.90252101 0.90756303 0.90840336 0.90420168]
mean value: 0.9054463534783131
key: test_fscore
value: [0.87769784 0.85507246 0.8951049 0.89361702 0.87671233 0.9
0.92198582 0.88405797 0.94029851 0.86956522]
mean value: 0.891411206211467
key: train_fscore
value: [0.90996016 0.91127098 0.90836653 0.91025641 0.91011236 0.91024623
0.90749601 0.91242038 0.91259022 0.90894569]
mean value: 0.9101664971757291
key: test_precision
value: [0.83561644 0.83098592 0.83116883 0.84 0.8 0.85135135
0.86666667 0.84722222 0.92647059 0.83333333]
mean value: 0.8462815346826821
key: train_precision
value: [0.86515152 0.86757991 0.86363636 0.86983155 0.87096774 0.86295181
0.86342944 0.86686838 0.87269939 0.86605784]
mean value: 0.8669173928283019
key: test_recall
value: [0.92424242 0.88059701 0.96969697 0.95454545 0.96969697 0.95454545
0.98484848 0.92424242 0.95454545 0.90909091]
mean value: 0.9426051560379919
key: train_recall
value: [0.95966387 0.95959596 0.95798319 0.95462185 0.95294118 0.96302521
0.95630252 0.96302521 0.95630252 0.95630252]
mean value: 0.957976402682285
key: test_roc_auc
value: [0.87256897 0.84938942 0.88636364 0.88636364 0.86363636 0.89393939
0.91666667 0.87878788 0.93939394 0.86363636]
mean value: 0.8850746268656716
key: train_roc_auc
value: [0.90491611 0.90668874 0.90336134 0.90588235 0.90588235 0.90504202
0.90252101 0.90756303 0.90840336 0.90420168]
mean value: 0.9054461986814928
key: test_jcc
value: [0.78205128 0.74683544 0.81012658 0.80769231 0.7804878 0.81818182
0.85526316 0.79220779 0.88732394 0.76923077]
mean value: 0.8049400901115182
key: train_jcc
value: [0.83479532 0.83700441 0.83211679 0.83529412 0.83505155 0.83527697
0.83065693 0.83894583 0.83923304 0.83308931]
mean value: 0.8351464258960671
MCC on Blind test: 0.68
Accuracy on Blind test: 0.87
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:136: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:139: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.98454618 0.67307448 0.86623049 0.59880996 0.63273072 0.63469291
0.69719315 0.63847065 0.83392954 0.6913631 ]
mean value: 0.7251041173934937
key: score_time
value: [0.06120563 0.03548265 0.03454709 0.03712463 0.02196646 0.04404831
0.02621794 0.02708268 0.03657389 0.02833247]
mean value: 0.03525817394256592
key: test_mcc
value: [0.748702 0.71431805 0.76072577 0.7800135 0.78816781 0.79373126
0.80758535 0.78932976 0.84887469 0.74456392]
mean value: 0.777601212041837
key: train_mcc
value: [0.81479237 0.84539007 0.8465731 0.81564917 0.83512819 0.81558668
0.82970909 0.83719788 0.83285454 0.83465317]
mean value: 0.8307534248261067
key: test_accuracy
value: [0.87218045 0.85714286 0.87878788 0.88636364 0.88636364 0.89393939
0.90151515 0.89393939 0.92424242 0.87121212]
mean value: 0.8865686944634312
key: train_accuracy
value: [0.90496215 0.92094197 0.92184874 0.90588235 0.91596639 0.90504202
0.91344538 0.91680672 0.91512605 0.91596639]
mean value: 0.9135988154723622
key: test_fscore
value: [0.87769784 0.85925926 0.88405797 0.89361702 0.89655172 0.9
0.90647482 0.89705882 0.92537313 0.87591241]
mean value: 0.8916003004175677
key: train_fscore
value: [0.90996016 0.92431562 0.92493947 0.91025641 0.9194847 0.91024623
0.91686844 0.92048193 0.91835085 0.91922456]
mean value: 0.9174128360722797
key: test_precision
value: [0.83561644 0.85294118 0.84722222 0.84 0.82278481 0.85135135
0.8630137 0.87142857 0.91176471 0.84507042]
mean value: 0.8541193397003181
key: train_precision
value: [0.86515152 0.88580247 0.88975155 0.86983155 0.88253478 0.86295181
0.88198758 0.88153846 0.8847352 0.88491446]
mean value: 0.8789199372030476
key: test_recall
value: [0.92424242 0.86567164 0.92424242 0.95454545 0.98484848 0.95454545
0.95454545 0.92424242 0.93939394 0.90909091]
mean value: 0.9335368611488014
key: train_recall
value: [0.95966387 0.96632997 0.96302521 0.95462185 0.95966387 0.96302521
0.95462185 0.96302521 0.95462185 0.95630252]
mean value: 0.9594901394901395
key: test_roc_auc
value: [0.87256897 0.85707825 0.87878788 0.88636364 0.88636364 0.89393939
0.90151515 0.89393939 0.92424242 0.87121212]
mean value: 0.8866010854816825
key: train_roc_auc
value: [0.90491611 0.92098011 0.92184874 0.90588235 0.91596639 0.90504202
0.91344538 0.91680672 0.91512605 0.91596639]
mean value: 0.9135980250686133
key: test_jcc
value: [0.78205128 0.75324675 0.79220779 0.80769231 0.8125 0.81818182
0.82894737 0.81333333 0.86111111 0.77922078]
mean value: 0.804849254546623
key: train_jcc
value: [0.83479532 0.85928144 0.86036036 0.83529412 0.8509687 0.83527697
0.84649776 0.85267857 0.8490284 0.85052317]
mean value: 0.8474704813594193
MCC on Blind test: 0.66
Accuracy on Blind test: 0.86
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.12481308 0.16899014 0.2003262 0.20602155 0.16480041 0.11521244
0.11547208 0.13901997 0.20162272 0.07334018]
mean value: 0.15096187591552734
key: score_time
value: [0.02007031 0.01953554 0.03566623 0.05057263 0.02884102 0.03782344
0.01230049 0.01505136 0.0135622 0.01324034]
mean value: 0.024666357040405273
key: test_mcc
value: [0.69961549 0.60900045 0.68568568 0.79373126 0.74420841 0.69986771
0.75897093 0.75897093 0.833429 0.74456392]
mean value: 0.7328043779458789
key: train_mcc
value: [0.76766604 0.78845283 0.76980875 0.76610944 0.76595745 0.77672743
0.75440121 0.77176593 0.75793243 0.77079477]
mean value: 0.7689616274532732
key: test_accuracy
value: [0.84962406 0.80451128 0.84090909 0.89393939 0.86363636 0.84848485
0.87878788 0.87878788 0.91666667 0.87121212]
mean value: 0.8646559580770107
key: train_accuracy
value: [0.88309504 0.89318755 0.88403361 0.88235294 0.88235294 0.88739496
0.87647059 0.88487395 0.87815126 0.88487395]
mean value: 0.8836786792092783
key: test_fscore
value: [0.85074627 0.80597015 0.84892086 0.9 0.87671233 0.85507246
0.88235294 0.88235294 0.91729323 0.87591241]
mean value: 0.8695333597949811
key: train_fscore
value: [0.88671557 0.89683184 0.88780488 0.8858075 0.88562092 0.89123377
0.8801956 0.88888889 0.8820179 0.88779689]
mean value: 0.8872913750285026
key: test_precision
value: [0.83823529 0.80597015 0.80821918 0.85135135 0.8 0.81944444
0.85714286 0.85714286 0.91044776 0.84507042]
mean value: 0.8393024315264321
key: train_precision
value: [0.86075949 0.86656201 0.85984252 0.86053883 0.86168521 0.86185243
0.85443038 0.85893417 0.85488959 0.8658147 ]
mean value: 0.8605309333357611
key: test_recall
value: [0.86363636 0.80597015 0.89393939 0.95454545 0.96969697 0.89393939
0.90909091 0.90909091 0.92424242 0.90909091]
mean value: 0.9033242876526458
key: train_recall
value: [0.91428571 0.92929293 0.91764706 0.91260504 0.91092437 0.92268908
0.90756303 0.9210084 0.91092437 0.91092437]
mean value: 0.9157864357864358
key: test_roc_auc
value: [0.84972863 0.80450023 0.84090909 0.89393939 0.86363636 0.84848485
0.87878788 0.87878788 0.91666667 0.87121212]
mean value: 0.8646653098145636
key: train_roc_auc
value: [0.88306878 0.89321789 0.88403361 0.88235294 0.88235294 0.88739496
0.87647059 0.88487395 0.87815126 0.88487395]
mean value: 0.8836790877967348
key: test_jcc
value: [0.74025974 0.675 0.7375 0.81818182 0.7804878 0.74683544
0.78947368 0.78947368 0.84722222 0.77922078]
mean value: 0.7703655176221637
key: train_jcc
value: [0.79648609 0.81296024 0.79824561 0.79502196 0.79472141 0.80380673
0.7860262 0.8 0.78893741 0.7982327 ]
mean value: 0.7974438350039706
MCC on Blind test: 0.69
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [3.47399139 3.63057613 3.74403596 3.35848165 3.5461297 2.97885227
3.33522224 3.00847769 3.41326523 3.35739779]
mean value: 3.384643006324768
key: score_time
value: [0.01660132 0.02039504 0.02370691 0.02178669 0.0171504 0.03919339
0.01123929 0.02096009 0.02390099 0.02181339]
mean value: 0.021674752235412598
key: test_mcc
value: [0.74440174 0.67051692 0.66943868 0.76320314 0.74420841 0.72760688
0.74250948 0.80312249 0.80534465 0.71285802]
mean value: 0.7383210399826687
key: train_mcc
value: [0.83036998 0.84573107 0.82558796 0.82425731 0.77542098 0.81049737
0.80032673 0.78134411 0.80732609 0.82067938]
mean value: 0.8121540966146651
key: test_accuracy
value: [0.87218045 0.83458647 0.83333333 0.87878788 0.86363636 0.86363636
0.87121212 0.90151515 0.90151515 0.85606061]
mean value: 0.8676463886990202
key: train_accuracy
value: [0.91505467 0.92262405 0.91260504 0.91176471 0.88739496 0.90504202
0.9 0.88991597 0.90336134 0.91008403]
mean value: 0.9057846788841692
key: test_fscore
value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.87022901 0.83076923 0.84057971 0.88571429 0.87671233 0.86567164
0.87218045 0.90225564 0.90510949 0.85925926]
mean value: 0.8708481043356118
key: train_fscore
value: [0.91618257 0.92384106 0.91390728 0.91358025 0.88962109 0.90653433
0.90140845 0.89323553 0.90519373 0.91164327]
mean value: 0.9075147566196161
key: test_precision
value: [0.87692308 0.85714286 0.80555556 0.83783784 0.8 0.85294118
0.86567164 0.89552239 0.87323944 0.84057971]
mean value: 0.8505413680545307
key: train_precision
value: [0.90491803 0.90879479 0.9004894 0.89516129 0.8723748 0.89250814
0.88888889 0.86708861 0.88834951 0.8961039 ]
mean value: 0.8914677356328867
key: test_recall
value: [0.86363636 0.80597015 0.87878788 0.93939394 0.96969697 0.87878788
0.87878788 0.90909091 0.93939394 0.87878788]
mean value: 0.8942333785617368
key: train_recall
value: [0.92773109 0.93939394 0.92773109 0.93277311 0.90756303 0.9210084
0.91428571 0.9210084 0.92268908 0.92773109]
mean value: 0.9241914947797301
key: test_roc_auc
value: [0.87211669 0.83480326 0.83333333 0.87878788 0.86363636 0.86363636
0.87121212 0.90151515 0.90151515 0.85606061]
mean value: 0.8676616915422886
key: train_roc_auc
value: [0.915044 0.92263815 0.91260504 0.91176471 0.88739496 0.90504202
0.9 0.88991597 0.90336134 0.91008403]
mean value: 0.9057850210791387
key: test_jcc
value: [0.77027027 0.71052632 0.725 0.79487179 0.7804878 0.76315789
0.77333333 0.82191781 0.82666667 0.75324675]
mean value: 0.7719478642012361
key: train_jcc
value: [0.84532925 0.85846154 0.84146341 0.84090909 0.80118694 0.8290469
0.82051282 0.80706922 0.82680723 0.83763278]
mean value: 0.8308419181684118
MCC on Blind test: 0.66
Accuracy on Blind test: 0.86
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0162425 0.0160017 0.01585889 0.01603508 0.01597643 0.01602697
0.01607943 0.01623201 0.01599336 0.01613903]
mean value: 0.01605854034423828
key: score_time
value: [0.01126313 0.01120281 0.01129889 0.01132226 0.01130223 0.01134753
0.01124859 0.01127553 0.01130271 0.01125908]
mean value: 0.01128227710723877
key: test_mcc
value: [0.42990719 0.37538744 0.42919754 0.44066028 0.53183137 0.49306684
0.43944438 0.42600643 0.5768179 0.50471461]
mean value: 0.4647033999224429
key: train_mcc
value: [0.49601743 0.51214528 0.4853481 0.4943284 0.49286649 0.48000922
0.48289321 0.50592018 0.48083138 0.48305298]
mean value: 0.49134126650583104
key: test_accuracy
value: [0.71428571 0.68421053 0.71212121 0.71969697 0.76515152 0.74242424
0.71969697 0.71212121 0.78787879 0.75 ]
mean value: 0.7307587149692413
key: train_accuracy
value: [0.74684609 0.75441548 0.74117647 0.74621849 0.74537815 0.73865546
0.74033613 0.75210084 0.7394958 0.74033613]
mean value: 0.7444959043331378
key: test_fscore
value: [0.6984127 0.6557377 0.68852459 0.70866142 0.75590551 0.71666667
0.71755725 0.6984127 0.78125 0.73170732]
mean value: 0.7152835856689457
key: train_fscore
value: [0.73433363 0.73928571 0.72597865 0.73462214 0.73303965 0.72404614
0.72727273 0.74145486 0.72759227 0.72679045]
mean value: 0.7314416230885522
key: test_precision
value: [0.73333333 0.72727273 0.75 0.73770492 0.78688525 0.7962963
0.72307692 0.73333333 0.80645161 0.78947368]
mean value: 0.7583828074360791
key: train_precision
value: [0.7732342 0.78707224 0.77126654 0.76979742 0.77037037 0.76691729
0.76579926 0.77472527 0.76243094 0.76679104]
mean value: 0.770840458530029
key: test_recall
value: [0.66666667 0.59701493 0.63636364 0.68181818 0.72727273 0.65151515
0.71212121 0.66666667 0.75757576 0.68181818]
mean value: 0.6778833107191315
key: train_recall
value: [0.69915966 0.6969697 0.68571429 0.70252101 0.69915966 0.68571429
0.69243697 0.71092437 0.69579832 0.6907563 ]
mean value: 0.6959154570919277
key: test_roc_auc
value: [0.71393035 0.6848711 0.71212121 0.71969697 0.76515152 0.74242424
0.71969697 0.71212121 0.78787879 0.75 ]
mean value: 0.7307892356399819
key: train_roc_auc
value: [0.74688623 0.7543672 0.74117647 0.74621849 0.74537815 0.73865546
0.74033613 0.75210084 0.7394958 0.74033613]
mean value: 0.7444950909656792
key: test_jcc
value: [0.53658537 0.48780488 0.525 0.54878049 0.60759494 0.55844156
0.55952381 0.53658537 0.64102564 0.57692308]
mean value: 0.5578265120183923
key: train_jcc
value: [0.58019526 0.58640227 0.5698324 0.58055556 0.57858136 0.5674548
0.57142857 0.58913649 0.5718232 0.57083333]
mean value: 0.5766243242866349
MCC on Blind test: 0.41
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01833224 0.02844119 0.03023386 0.03021145 0.04428649 0.03315878
0.05487323 0.02842569 0.03093243 0.02116919]
mean value: 0.032006454467773435
key: score_time
value: [0.01621819 0.02138519 0.02097058 0.02414799 0.02312398 0.03966141
0.02990365 0.03438926 0.02296233 0.01327443]
mean value: 0.02460370063781738
key: test_mcc
value: [0.5339213 0.53558614 0.44562679 0.57575758 0.67161876 0.60717674
0.57602211 0.60633906 0.63753558 0.59097693]
mean value: 0.5780560993827096
key: train_mcc
value: [0.62994548 0.60817175 0.60363763 0.60684612 0.62694743 0.62016807
0.6103634 0.60178352 0.6067364 0.58220454]
mean value: 0.6096804333287658
key: test_accuracy
value: [0.76691729 0.76691729 0.71969697 0.78787879 0.83333333 0.8030303
0.78787879 0.8030303 0.81818182 0.79545455]
mean value: 0.7882319434951014
key: train_accuracy
value: [0.81497056 0.80403701 0.80168067 0.80336134 0.81344538 0.81008403
0.80504202 0.80084034 0.80336134 0.7907563 ]
mean value: 0.8047578997957467
key: test_fscore
value: [0.76691729 0.75968992 0.69421488 0.78787879 0.84285714 0.796875
0.78461538 0.8 0.82352941 0.79389313]
mean value: 0.7850470948633774
key: train_fscore
value: [0.81481481 0.80203908 0.79863481 0.80135823 0.81469115 0.81008403
0.80204778 0.79898219 0.80269815 0.78552972]
mean value: 0.8030879959995846
key: test_precision
value: [0.76119403 0.79032258 0.76363636 0.78787879 0.7972973 0.82258065
0.796875 0.8125 0.8 0.8 ]
mean value: 0.7932284704469647
key: train_precision
value: [0.81618887 0.80960549 0.81109185 0.80960549 0.8092869 0.81008403
0.81455806 0.80650685 0.80541455 0.80565371]
mean value: 0.8097995804820648
key: test_recall
value: [0.77272727 0.73134328 0.63636364 0.78787879 0.89393939 0.77272727
0.77272727 0.78787879 0.84848485 0.78787879]
mean value: 0.779194934418815
key: train_recall
value: [0.81344538 0.79461279 0.78655462 0.79327731 0.82016807 0.81008403
0.78991597 0.79159664 0.8 0.76638655]
mean value: 0.7966041366041366
key: test_roc_auc
value: [0.76696065 0.76718679 0.71969697 0.78787879 0.83333333 0.8030303
0.78787879 0.8030303 0.81818182 0.79545455]
mean value: 0.7882632293080054
key: train_roc_auc
value: [0.81497185 0.80402909 0.80168067 0.80336134 0.81344538 0.81008403
0.80504202 0.80084034 0.80336134 0.7907563 ]
mean value: 0.8047572362278245
key: test_jcc
value: [0.62195122 0.6125 0.53164557 0.65 0.72839506 0.66233766
0.64556962 0.66666667 0.7 0.65822785]
mean value: 0.6477293648219603
key: train_jcc
value: [0.6875 0.66950355 0.66477273 0.66855524 0.68732394 0.68079096
0.66951567 0.66525424 0.67042254 0.64680851]
mean value: 0.671044737093254
MCC on Blind test: 0.52
Accuracy on Blind test: 0.81
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.02491474 0.01506805 0.01512551 0.01483035 0.01469612 0.01485538
0.01491857 0.0151217 0.01491117 0.0147326 ]
mean value: 0.015917420387268066
key: score_time
value: [0.03499436 0.02949691 0.02866745 0.02961278 0.03052759 0.02934623
0.02845168 0.03219891 0.03506231 0.02963805]
mean value: 0.03079962730407715
key: test_mcc
value: [0.53140339 0.6557787 0.42443734 0.67445327 0.56855832 0.5992912
0.66943868 0.67445327 0.64109064 0.50709255]
mean value: 0.5945997369934125
key: train_mcc
value: [0.75352205 0.7313818 0.7451251 0.72677403 0.73661389 0.74492321
0.74481758 0.72071332 0.72352842 0.7393187 ]
mean value: 0.7366718091955822
key: test_accuracy
value: [0.7593985 0.82706767 0.71212121 0.83333333 0.78030303 0.79545455
0.83333333 0.83333333 0.81818182 0.75 ]
mean value: 0.794252677147414
key: train_accuracy
value: [0.87384357 0.86291001 0.8697479 0.85882353 0.86470588 0.86890756
0.8697479 0.85630252 0.85630252 0.86638655]
mean value: 0.8647677944180195
key: test_fscore
value: [0.78082192 0.83453237 0.71641791 0.84507042 0.7972028 0.81118881
0.84057971 0.84507042 0.82857143 0.76923077]
mean value: 0.8068686563765856
key: train_fscore
value: [0.88132911 0.87073751 0.87727633 0.86915888 0.8735271 0.87735849
0.87708168 0.8663018 0.86774942 0.87470449]
mean value: 0.8735224811615062
key: test_precision
value: [0.7125 0.80555556 0.70588235 0.78947368 0.74025974 0.75324675
0.80555556 0.78947368 0.78378378 0.71428571]
mean value: 0.7600016824049332
key: train_precision
value: [0.83258595 0.82308846 0.82934132 0.80986938 0.820059 0.82422452
0.83033033 0.80994152 0.80372493 0.82344214]
mean value: 0.8206607530876882
key: test_recall
value: [0.86363636 0.86567164 0.72727273 0.90909091 0.86363636 0.87878788
0.87878788 0.90909091 0.87878788 0.83333333]
mean value: 0.8608095884215288
key: train_recall
value: [0.93613445 0.92424242 0.93109244 0.93781513 0.93445378 0.93781513
0.92941176 0.93109244 0.94285714 0.93277311]
mean value: 0.9337687802393685
key: test_roc_auc
value: [0.76017639 0.82677521 0.71212121 0.83333333 0.78030303 0.79545455
0.83333333 0.83333333 0.81818182 0.75 ]
mean value: 0.7943012211668928
key: train_roc_auc
value: [0.87379113 0.86296155 0.8697479 0.85882353 0.86470588 0.86890756
0.8697479 0.85630252 0.85630252 0.86638655]
mean value: 0.8647677050618228
key: test_jcc
value: [0.64044944 0.71604938 0.55813953 0.73170732 0.6627907 0.68235294
0.725 0.73170732 0.70731707 0.625 ]
mean value: 0.678051370196998
key: train_jcc
value: [0.78783593 0.77106742 0.78138223 0.76859504 0.77545328 0.78151261
0.78107345 0.76413793 0.76639344 0.77731092]
mean value: 0.7754762238935481
MCC on Blind test: 0.45
Accuracy on Blind test: 0.76
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.09510469 0.10395908 0.10734057 0.10678625 0.10629964 0.1047132
0.14021826 0.16393042 0.11608148 0.1044817 ]
mean value: 0.11489152908325195
key: score_time
value: [0.02618408 0.03990245 0.03856707 0.0683701 0.03701854 0.03856254
0.04088354 0.03897285 0.03920102 0.03909039]
mean value: 0.04067525863647461
key: test_mcc
value: [0.613804 0.6557787 0.5992912 0.72635073 0.73125738 0.66943868
0.73960026 0.71417356 0.77711043 0.73267501]
mean value: 0.6959479952127847
key: train_mcc
value: [0.72363501 0.74493969 0.75019202 0.73374129 0.73069778 0.73406597
0.72761525 0.73620188 0.7226944 0.75068681]
mean value: 0.7354470114857837
key: test_accuracy
value: [0.80451128 0.82706767 0.79545455 0.85606061 0.85606061 0.83333333
0.86363636 0.85606061 0.88636364 0.86363636]
mean value: 0.8442185007974482
key: train_accuracy
value: [0.85870479 0.86963835 0.87142857 0.86386555 0.86302521 0.86386555
0.8605042 0.86386555 0.85798319 0.87226891]
mean value: 0.8645149868189497
key: test_fscore
value: [0.81428571 0.83453237 0.81118881 0.86896552 0.8707483 0.84057971
0.875 0.86131387 0.89208633 0.87142857]
mean value: 0.8540129197258242
key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
train_fscore
value: [0.86750789 0.87708168 0.87981147 0.87203791 0.87032617 0.87223975
0.86929134 0.8734375 0.86703383 0.87993681]
mean value: 0.8728704351424548
key: test_precision
value: [0.77027027 0.80555556 0.75324675 0.79746835 0.79012346 0.80555556
0.80769231 0.83098592 0.84931507 0.82432432]
mean value: 0.8034537561851378
key: train_precision
value: [0.81723626 0.82908546 0.8259587 0.82265276 0.82628399 0.82169391
0.81777778 0.81605839 0.81508876 0.83010432]
mean value: 0.8221940319020319
key: test_recall
value: [0.86363636 0.86567164 0.87878788 0.95454545 0.96969697 0.87878788
0.95454545 0.89393939 0.93939394 0.92424242]
mean value: 0.9123247399366803
key: train_recall
value: [0.92436975 0.93097643 0.94117647 0.92773109 0.91932773 0.92941176
0.92773109 0.9394958 0.92605042 0.93613445]
mean value: 0.9302405002405003
key: test_roc_auc
value: [0.80495251 0.82677521 0.79545455 0.85606061 0.85606061 0.83333333
0.86363636 0.85606061 0.88636364 0.86363636]
mean value: 0.8442333785617367
key: train_roc_auc
value: [0.85864952 0.8696899 0.87142857 0.86386555 0.86302521 0.86386555
0.8605042 0.86386555 0.85798319 0.87226891]
mean value: 0.8645146139263786
key: test_jcc
value: [0.68674699 0.71604938 0.68235294 0.76829268 0.77108434 0.725
0.77777778 0.75641026 0.80519481 0.7721519 ]
mean value: 0.7461061070237571
key: train_jcc
value: [0.76601671 0.78107345 0.78541374 0.77310924 0.77042254 0.77342657
0.76880223 0.77531207 0.76527778 0.78561354]
mean value: 0.7744467869457157
MCC on Blind test: 0.66
Accuracy on Blind test: 0.86
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [ 5.61930633 10.23130774 10.04514098 6.87647343 3.24669886 5.83537579
5.23547888 5.2183423 6.56729603 11.27240038]
mean value: 7.014782071113586
key: score_time
value: [0.01726365 0.02066183 0.01314092 0.01353931 0.01514196 0.0191195
0.02220035 0.02619338 0.02870178 0.0352571 ]
mean value: 0.021121978759765625
key: test_mcc
value: [0.86558065 0.70694867 0.84119102 0.84515425 0.87177979 0.83806027
0.89901011 0.8824419 0.89486432 0.82773811]
mean value: 0.84727690965409
key: train_mcc
value: [0.97990346 0.96843712 0.95708143 0.93488809 0.92357126 0.98823669
0.93468125 0.97315872 0.9412097 0.98162491]
mean value: 0.9582792637063065
key: test_accuracy
value: [0.93233083 0.84962406 0.91666667 0.91666667 0.93181818 0.91666667
0.9469697 0.93939394 0.9469697 0.90909091]
mean value: 0.9206197311460469
key: train_accuracy
value: [0.98990749 0.98402019 0.97815126 0.96638655 0.96134454 0.99411765
0.96638655 0.98655462 0.97058824 0.9907563 ]
mean value: 0.97882133845969
key: test_fscore
value: [0.93333333 0.86111111 0.92198582 0.92307692 0.93617021 0.92086331
0.94964029 0.94202899 0.94573643 0.91549296]
mean value: 0.9249439370374717
key: train_fscore
value: [0.98998331 0.98423237 0.9785832 0.96747967 0.96217105 0.9941127
0.96742671 0.98662207 0.97046414 0.99082569]
mean value: 0.9791900900647359
key: test_precision
value: [0.91304348 0.80519481 0.86666667 0.85714286 0.88 0.87671233
0.90410959 0.90277778 0.96825397 0.85526316]
mean value: 0.88291646289999
key: train_precision
value: [0.98341625 0.9705401 0.95961228 0.93700787 0.94202899 0.99494949
0.93838863 0.98169717 0.97457627 0.98344371]
mean value: 0.966566075938182
key: test_recall
value: [0.95454545 0.92537313 0.98484848 1. 1. 0.96969697
1. 0.98484848 0.92424242 0.98484848]
mean value: 0.9728403437358661
key: train_recall
value: [0.99663866 0.9983165 0.99831933 1. 0.98319328 0.99327731
0.99831933 0.99159664 0.96638655 0.99831933]
mean value: 0.9924366918484565
key: test_roc_auc
value: [0.93249661 0.8490502 0.91666667 0.91666667 0.93181818 0.91666667
0.9469697 0.93939394 0.9469697 0.90909091]
mean value: 0.9205789235639983
key: train_roc_auc
value: [0.98990182 0.9840322 0.97815126 0.96638655 0.96134454 0.99411765
0.96638655 0.98655462 0.97058824 0.9907563 ]
mean value: 0.978821973233738
key: test_jcc
value: [0.875 0.75609756 0.85526316 0.85714286 0.88 0.85333333
0.90410959 0.89041096 0.89705882 0.84415584]
mean value: 0.8612572124976998
key: train_jcc
value: [0.98016529 0.96895425 0.95806452 0.93700787 0.92709984 0.98829431
0.93690852 0.97359736 0.94262295 0.98181818]
mean value: 0.9594533093393642
MCC on Blind test: 0.65
Accuracy on Blind test: 0.86
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.09097266 0.06637001 0.06557441 0.06874728 0.0701592 0.07333708
0.06844783 0.07955551 0.06623769 0.07119703]
mean value: 0.07205986976623535
key: score_time
value: [0.01410508 0.0142591 0.01380372 0.0135386 0.01349068 0.01329613
0.01343465 0.0135026 0.01350117 0.01335311]
mean value: 0.013628482818603516
key: test_mcc
value: [0.85122361 0.81953867 0.92690611 0.87177979 0.91076511 0.8824419
0.95553309 0.89901011 0.93982555 0.9251987 ]
mean value: 0.8982222631451187
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.92481203 0.90977444 0.96212121 0.93181818 0.95454545 0.93939394
0.97727273 0.9469697 0.96969697 0.96212121]
mean value: 0.9478525860104807
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92647059 0.91044776 0.96350365 0.93617021 0.95588235 0.94202899
0.97777778 0.94964029 0.97014925 0.96296296]
mean value: 0.9495033832520609
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.9 0.91044776 0.92957746 0.88 0.92857143 0.90277778
0.95652174 0.90410959 0.95588235 0.94202899]
mean value: 0.9209917098951922
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.91044776 1. 1. 0.98484848 0.98484848
1. 1. 0.98484848 0.98484848]
mean value: 0.9804387155133424
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92503392 0.90976934 0.96212121 0.93181818 0.95454545 0.93939394
0.97727273 0.9469697 0.96969697 0.96212121]
mean value: 0.9478742650384442
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.8630137 0.83561644 0.92957746 0.88 0.91549296 0.89041096
0.95652174 0.90410959 0.94202899 0.92857143]
mean value: 0.9045343260675828
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.89
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.31542921 0.28832293 0.30350065 0.30793715 0.28838658 0.33395696
0.19791198 0.19763827 0.20193744 0.20983505]
mean value: 0.26448562145233157
key: score_time
value: [0.02723169 0.02744985 0.02742767 0.0272634 0.02694702 0.02197075
0.01971865 0.02128482 0.02003503 0.02088046]
mean value: 0.024020934104919435
key: test_mcc
value: [0.88011764 0.84996625 0.82158384 0.89486432 0.8824419 0.86612538
0.92690611 0.91287093 0.95465504 0.88040627]
mean value: 0.8869937674866786
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93984962 0.92481203 0.90909091 0.9469697 0.93939394 0.93181818
0.96212121 0.95454545 0.97727273 0.93939394]
mean value: 0.9425267714741399
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94029851 0.92647059 0.91304348 0.94814815 0.94202899 0.93430657
0.96350365 0.95652174 0.97709924 0.94117647]
mean value: 0.9442597372952238
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.92647059 0.91304348 0.875 0.92753623 0.90277778 0.90140845
0.92957746 0.91666667 0.98461538 0.91428571]
mean value: 0.9191381757218723
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.94029851 0.95454545 0.96969697 0.98484848 0.96969697
1. 1. 0.96969697 0.96969697]
mean value: 0.971302578018996
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93995929 0.92469471 0.90909091 0.9469697 0.93939394 0.93181818
0.96212121 0.95454545 0.97727273 0.93939394]
mean value: 0.9425260063319765
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.88732394 0.8630137 0.84 0.90140845 0.89041096 0.87671233
0.92957746 0.91666667 0.95522388 0.88888889]
mean value: 0.894922628160887
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.49
Accuracy on Blind test: 0.81
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01355791 0.01356125 0.01352477 0.01341796 0.01332808 0.01341581
0.01773357 0.01977444 0.02038264 0.01979041]
mean value: 0.015848684310913085
key: score_time
value: [0.00930381 0.00938654 0.00943065 0.00930262 0.00963712 0.00926065
0.01308894 0.01308584 0.01314425 0.0130682 ]
mean value: 0.010870862007141113
key: test_mcc
value: [0.82915052 0.74830832 0.7800135 0.85201287 0.85478752 0.78816781
0.82425939 0.82773811 0.78816781 0.81060226]
mean value: 0.8103208098401626
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.90977444 0.87218045 0.88636364 0.92424242 0.92424242 0.88636364
0.90909091 0.90909091 0.88636364 0.90151515]
mean value: 0.9009227614490772
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.91549296 0.87943262 0.89361702 0.92753623 0.92857143 0.89655172
0.91428571 0.91549296 0.89655172 0.90780142]
mean value: 0.9075333802339808
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85526316 0.83783784 0.84 0.88888889 0.87837838 0.82278481
0.86486486 0.85526316 0.82278481 0.85333333]
mean value: 0.8519399239345942
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98484848 0.92537313 0.95454545 0.96969697 0.98484848 0.98484848
0.96969697 0.98484848 0.98484848 0.96969697]
mean value: 0.9713251922207147
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.91033469 0.87177748 0.88636364 0.92424242 0.92424242 0.88636364
0.90909091 0.90909091 0.88636364 0.90151515]
mean value: 0.9009384893713251
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.84415584 0.78481013 0.80769231 0.86486486 0.86666667 0.8125
0.84210526 0.84415584 0.8125 0.83116883]
mean value: 0.8310619748444532
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.38
Accuracy on Blind test: 0.76
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [4.82775021 4.74955702 3.40215826 3.2394557 3.25225616 3.24565363
3.2456162 3.36915302 3.59266257 3.57445192]
mean value: 3.6498714685440063
key: score_time
value: [0.14048791 0.14100409 0.10365963 0.10351729 0.1028018 0.11075878
0.11101532 0.11172509 0.10809517 0.1088829 ]
mean value: 0.11419479846954346
key: test_mcc
value: [0.94028503 0.85299767 0.93982555 0.92690611 0.94112395 0.85478752
0.94112395 0.94112395 0.95465504 0.89651574]
mean value: 0.9189344490412299
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96992481 0.92481203 0.96969697 0.96212121 0.96969697 0.92424242
0.96969697 0.96969697 0.97727273 0.9469697 ]
mean value: 0.9584130781499203
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97014925 0.92857143 0.97014925 0.96350365 0.97058824 0.92857143
0.97058824 0.97058824 0.97744361 0.94890511]
mean value: 0.959905843863454
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95588235 0.89041096 0.95588235 0.92957746 0.94285714 0.87837838
0.94285714 0.94285714 0.97014925 0.91549296]
mean value: 0.9324345148002824
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98484848 0.97014925 0.98484848 1. 1. 0.98484848
1. 1. 0.98484848 0.98484848]
mean value: 0.9894391677973767
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97003618 0.92446857 0.96969697 0.96212121 0.96969697 0.92424242
0.96969697 0.96969697 0.97727273 0.9469697 ]
mean value: 0.95838986883763
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94202899 0.86666667 0.94202899 0.92957746 0.94285714 0.86666667
0.94285714 0.94285714 0.95588235 0.90277778]
mean value: 0.9234200328426941
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.3209312 1.44431853 1.58112407 1.51067591 1.5859437 2.19948959
1.38442206 1.93691373 1.28188467 1.27489114]
mean value: 1.5520594596862793
key: score_time
value: [0.15015864 0.19255185 0.17973113 0.18013835 0.17181015 0.19955444
0.27024174 0.23727822 0.20012379 0.24910641]
mean value: 0.20306947231292724
key: test_mcc
value: [0.89567796 0.80667588 0.89404202 0.92690611 0.94112395 0.83806027
0.89486432 0.90950859 0.93982555 0.88040627]
mean value: 0.892709092074466
key: train_mcc
value: [0.95517845 0.96007108 0.94492358 0.95674042 0.9502296 0.9484901
0.94492358 0.94686596 0.94860815 0.94524429]
mean value: 0.9501275201585805
key: test_accuracy
value: [0.94736842 0.90225564 0.9469697 0.96212121 0.96969697 0.91666667
0.9469697 0.95454545 0.96969697 0.93939394]
mean value: 0.9455684666210982
key: train_accuracy
value: [0.97729184 0.97981497 0.97226891 0.97815126 0.97478992 0.97394958
0.97226891 0.97310924 0.97394958 0.97226891]
mean value: 0.9747863114968442
key: test_fscore
value: [0.94814815 0.90647482 0.94736842 0.96350365 0.97058824 0.92086331
0.94814815 0.95522388 0.97014925 0.94117647]
mean value: 0.9471644336691079
key: train_fscore
value: [0.97770438 0.9800995 0.97265949 0.97847682 0.97524752 0.97440132
0.97265949 0.97359736 0.97444353 0.97279472]
mean value: 0.9752084130865094
key: test_precision
value: [0.92753623 0.875 0.94029851 0.92957746 0.94285714 0.87671233
0.92753623 0.94117647 0.95588235 0.91428571]
mean value: 0.9230862445458927
key: train_precision
value: [0.96103896 0.96568627 0.95915033 0.96411093 0.95786062 0.95779221
0.95915033 0.95623987 0.95631068 0.95469256]
mean value: 0.9592032749258542
key: test_recall
value: [0.96969697 0.94029851 0.95454545 1. 1. 0.96969697
0.96969697 0.96969697 0.98484848 0.96969697]
mean value: 0.9728177295341475
key: train_recall
value: [0.99495798 0.99494949 0.98655462 0.99327731 0.99327731 0.99159664
0.98655462 0.99159664 0.99327731 0.99159664]
mean value: 0.9917638570579748
key: test_roc_auc
value: [0.94753505 0.90196744 0.9469697 0.96212121 0.96969697 0.91666667
0.9469697 0.95454545 0.96969697 0.93939394]
mean value: 0.9455563093622795
key: train_roc_auc
value: [0.97727697 0.97982769 0.97226891 0.97815126 0.97478992 0.97394958
0.97226891 0.97310924 0.97394958 0.97226891]
mean value: 0.9747860962566846
key: test_jcc
value: [0.90140845 0.82894737 0.9 0.92957746 0.94285714 0.85333333
0.90140845 0.91428571 0.94202899 0.88888889]
mean value: 0.9002735799490561
key: train_jcc
value: [0.95638126 0.96097561 0.94677419 0.95786062 0.95169082 0.95008052
0.94677419 0.94855305 0.95016077 0.9470305 ]
mean value: 0.9516281533345908
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.06151128 0.05858254 0.06312013 0.07686377 0.06964469 0.07349968
0.05664515 0.03685856 0.06115079 0.06200194]
mean value: 0.06198785305023193
key: score_time
value: [0.03485608 0.03934264 0.03880668 0.03443718 0.03964758 0.03490686
0.02829528 0.05180097 0.02917194 0.03911948]
mean value: 0.0370384693145752
key: test_mcc
value: [0.5339213 0.53558614 0.44562679 0.57575758 0.67161876 0.60717674
0.57602211 0.60633906 0.63753558 0.59097693]
mean value: 0.5780560993827096
key: train_mcc
value: [0.62994548 0.60817175 0.60363763 0.60684612 0.62694743 0.62016807
0.6103634 0.60178352 0.6067364 0.58220454]
mean value: 0.6096804333287658
key: test_accuracy
value: [0.76691729 0.76691729 0.71969697 0.78787879 0.83333333 0.8030303
0.78787879 0.8030303 0.81818182 0.79545455]
mean value: 0.7882319434951014
key: train_accuracy
value: [0.81497056 0.80403701 0.80168067 0.80336134 0.81344538 0.81008403
0.80504202 0.80084034 0.80336134 0.7907563 ]
mean value: 0.8047578997957467
key: test_fscore
value: [0.76691729 0.75968992 0.69421488 0.78787879 0.84285714 0.796875
0.78461538 0.8 0.82352941 0.79389313]
mean value: 0.7850470948633774
key: train_fscore
value: [0.81481481 0.80203908 0.79863481 0.80135823 0.81469115 0.81008403
0.80204778 0.79898219 0.80269815 0.78552972]
mean value: 0.8030879959995846
key: test_precision
value: [0.76119403 0.79032258 0.76363636 0.78787879 0.7972973 0.82258065
0.796875 0.8125 0.8 0.8 ]
mean value: 0.7932284704469647
key: train_precision
value: [0.81618887 0.80960549 0.81109185 0.80960549 0.8092869 0.81008403
0.81455806 0.80650685 0.80541455 0.80565371]
mean value: 0.8097995804820648
key: test_recall
value: [0.77272727 0.73134328 0.63636364 0.78787879 0.89393939 0.77272727
0.77272727 0.78787879 0.84848485 0.78787879]
mean value: 0.779194934418815
key: train_recall
value: [0.81344538 0.79461279 0.78655462 0.79327731 0.82016807 0.81008403
0.78991597 0.79159664 0.8 0.76638655]
mean value: 0.7966041366041366
key: test_roc_auc
value: [0.76696065 0.76718679 0.71969697 0.78787879 0.83333333 0.8030303
0.78787879 0.8030303 0.81818182 0.79545455]
mean value: 0.7882632293080054
key: train_roc_auc
value: [0.81497185 0.80402909 0.80168067 0.80336134 0.81344538 0.81008403
0.80504202 0.80084034 0.80336134 0.7907563 ]
mean value: 0.8047572362278245
key: test_jcc
value: [0.62195122 0.6125 0.53164557 0.65 0.72839506 0.66233766
0.64556962 0.66666667 0.7 0.65822785]
mean value: 0.6477293648219603
key: train_jcc
value: [0.6875 0.66950355 0.66477273 0.66855524 0.68732394 0.68079096
0.66951567 0.66525424 0.67042254 0.64680851]
mean value: 0.671044737093254
MCC on Blind test: 0.52
Accuracy on Blind test: 0.81
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [9.41463685 8.58698535 8.8384304 9.8259964 7.32652545 2.57784414
5.06993341 9.8626914 8.55897713 8.41459346]
mean value: 7.847661399841309
key: score_time
value: [0.02014756 0.02379441 0.02771115 0.01952624 0.01420593 0.01656413
0.02535224 0.03231311 0.02562571 0.01690316]
mean value: 0.022214365005493165
key: test_mcc
value: [0.91355192 0.86703475 0.94112395 0.91287093 0.94112395 0.85478752
0.9701425 0.91287093 0.93982555 0.91076511]
mean value: 0.9164097097925172
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.95488722 0.93233083 0.96969697 0.95454545 0.96969697 0.92424242
0.98484848 0.95454545 0.96969697 0.95454545]
mean value: 0.9569036226930964
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.95652174 0.9352518 0.97058824 0.95652174 0.97058824 0.92857143
0.98507463 0.95652174 0.97014925 0.95588235]
mean value: 0.9585671148650311
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91666667 0.90277778 0.94285714 0.91666667 0.94285714 0.87837838
0.97058824 0.91666667 0.95588235 0.92857143]
mean value: 0.9271912458677165
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 0.97014925 1. 1. 1. 0.98484848
1. 1. 0.98484848 0.98484848]
mean value: 0.9924694708276798
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.95522388 0.93204432 0.96969697 0.95454545 0.96969697 0.92424242
0.98484848 0.95454545 0.96969697 0.95454545]
mean value: 0.9569086386250565
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.91666667 0.87837838 0.94285714 0.91666667 0.94285714 0.86666667
0.97058824 0.91666667 0.94202899 0.91549296]
mean value: 0.9208869509307174
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.14503193 0.15870619 0.1729548 0.17024684 0.18737078 0.19274426
0.13104177 0.13881731 0.1534431 0.18702292]
mean value: 0.16373798847198487
key: score_time
value: [0.03879404 0.02698874 0.03794503 0.02734399 0.03862238 0.02691483
0.02693462 0.02665114 0.01282382 0.02664685]
mean value: 0.028966546058654785
key: test_mcc
value: [0.68430574 0.65422886 0.66943868 0.76642417 0.77521709 0.68378319
0.74250948 0.69986771 0.7431924 0.67161876]
mean value: 0.7090586089861183
key: train_mcc
value: [0.80866657 0.81346009 0.8086448 0.79321598 0.79555584 0.79465334
0.77715689 0.78250788 0.78328462 0.80270163]
mean value: 0.795984763671047
key: test_accuracy
value: [0.84210526 0.82706767 0.83333333 0.87878788 0.87878788 0.84090909
0.87121212 0.84848485 0.87121212 0.83333333]
mean value: 0.8525233538391433
key: train_accuracy
value: [0.90328007 0.9058032 0.90336134 0.89579832 0.89663866 0.89663866
0.88823529 0.8907563 0.8907563 0.90084034]
mean value: 0.8972108473330459
key: test_fscore
value: [0.84210526 0.82706767 0.84057971 0.88732394 0.89041096 0.84671533
0.87218045 0.85507246 0.87407407 0.84285714]
mean value: 0.8578387005336142
key: train_fscore
value: [0.90673155 0.90879479 0.90658002 0.8990228 0.90040486 0.89959184
0.89053498 0.89344262 0.89430894 0.90327869]
mean value: 0.9002691083913814
key: test_precision
value: [0.8358209 0.83333333 0.80555556 0.82894737 0.8125 0.81690141
0.86567164 0.81944444 0.85507246 0.7972973 ]
mean value: 0.8270544408583936
key: train_precision
value: [0.87617555 0.88012618 0.87735849 0.87203791 0.86875 0.87460317
0.87258065 0.872 0.86614173 0.8816 ]
mean value: 0.8741373688860552
key: test_recall
value: [0.84848485 0.82089552 0.87878788 0.95454545 0.98484848 0.87878788
0.87878788 0.89393939 0.89393939 0.89393939]
mean value: 0.8926956128448665
key: train_recall
value: [0.9394958 0.93939394 0.93781513 0.92773109 0.93445378 0.92605042
0.9092437 0.91596639 0.92436975 0.92605042]
mean value: 0.9280570409982175
key: test_roc_auc
value: [0.84215287 0.82711443 0.83333333 0.87878788 0.87878788 0.84090909
0.87121212 0.84848485 0.87121212 0.83333333]
mean value: 0.8525327905924921
key: train_roc_auc
value: [0.90324958 0.90583142 0.90336134 0.89579832 0.89663866 0.89663866
0.88823529 0.8907563 0.8907563 0.90084034]
mean value: 0.8972106216223864
key: test_jcc
value: [0.72727273 0.70512821 0.725 0.79746835 0.80246914 0.73417722
0.77333333 0.74683544 0.77631579 0.72839506]
mean value: 0.7516395265397042
key: train_jcc
value: [0.82937685 0.83283582 0.82912333 0.81656805 0.81885125 0.81750742
0.80267062 0.80740741 0.80882353 0.82361734]
mean value: 0.8186781620728142
MCC on Blind test: 0.63
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.02964282 0.03476715 0.04506683 0.04617333 0.05880022 0.04431558
0.04388618 0.04419994 0.04364896 0.04388213]
mean value: 0.04343831539154053
key: score_time
value: [0.01396203 0.0264771 0.0231936 0.02345204 0.02391791 0.02365637
0.02127576 0.02268791 0.02417183 0.02144814]
mean value: 0.02242426872253418
key: test_mcc
value: [0.56460751 0.55326567 0.4598545 0.50144101 0.62128344 0.56222174
0.54645907 0.57815159 0.68252363 0.59097693]
mean value: 0.5660785084147357
key: train_mcc
value: [0.56986488 0.58646496 0.57833507 0.55833105 0.57858369 0.58860201
0.55339154 0.57663023 0.5750832 0.55191644]
mean value: 0.5717203080182092
key: test_accuracy
value: [0.78195489 0.77443609 0.72727273 0.75 0.81060606 0.78030303
0.77272727 0.78787879 0.84090909 0.79545455]
mean value: 0.7821542492595124
key: train_accuracy
value: [0.78469302 0.79310345 0.78907563 0.7789916 0.78907563 0.79411765
0.77647059 0.78823529 0.78739496 0.77563025]
mean value: 0.7856788064258504
key: test_fscore
value: [0.78518519 0.76190476 0.70491803 0.74015748 0.81203008 0.77165354
0.77941176 0.77777778 0.84444444 0.79389313]
mean value: 0.7771376195385946
key: train_fscore
value: [0.78044597 0.78974359 0.78638298 0.77502139 0.78491859 0.79041916
0.77186964 0.78571429 0.78394535 0.77002584]
mean value: 0.7818486790915893
key: test_precision
value: [0.76811594 0.81355932 0.76785714 0.7704918 0.80597015 0.80327869
0.75714286 0.81666667 0.82608696 0.8 ]
mean value: 0.79291695283083
key: train_precision
value: [0.79684764 0.80208333 0.79655172 0.78919861 0.8006993 0.80487805
0.78809107 0.79518072 0.796875 0.78975265]
mean value: 0.7960158090319096
key: test_recall
value: [0.8030303 0.71641791 0.65151515 0.71212121 0.81818182 0.74242424
0.8030303 0.74242424 0.86363636 0.78787879]
mean value: 0.7640660334690186
key: train_recall
value: [0.76470588 0.77777778 0.77647059 0.76134454 0.7697479 0.77647059
0.75630252 0.77647059 0.77142857 0.7512605 ]
mean value: 0.7681979458450047
key: test_roc_auc
value: [0.78211217 0.77487562 0.72727273 0.75 0.81060606 0.78030303
0.77272727 0.78787879 0.84090909 0.79545455]
mean value: 0.7822139303482587
key: train_roc_auc
value: [0.78470984 0.79309057 0.78907563 0.7789916 0.78907563 0.79411765
0.77647059 0.78823529 0.78739496 0.77563025]
mean value: 0.7856792009733187
key: test_jcc
value: [0.64634146 0.61538462 0.5443038 0.5875 0.6835443 0.62820513
0.63855422 0.63636364 0.73076923 0.65822785]
mean value: 0.6369194240371803
key: train_jcc
value: [0.63994374 0.65254237 0.64796634 0.63268156 0.64598025 0.65346535
0.62849162 0.64705882 0.64466292 0.62605042]
mean value: 0.6418843403318552
MCC on Blind test: 0.53
Accuracy on Blind test: 0.81
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.03986382 0.07160902 0.06057954 0.05892277 0.07447052 0.06678772
0.06090355 0.05563188 0.0627687 0.07016468]
mean value: 0.062170219421386716
key: score_time
value: [0.02108192 0.02115917 0.02505922 0.02454782 0.02111268 0.02110505
0.02623105 0.03131795 0.02457404 0.02104902]
mean value: 0.023723793029785157
key: test_mcc
value: [0.56157905 0.537956 0.61702991 0.60234703 0.69293487 0.64379631
0.66776115 0.59133807 0.67426617 0.7800135 ]
mean value: 0.6369022051810874
key: train_mcc
value: [0.64621096 0.70065565 0.70049556 0.66869353 0.75909618 0.75437781
0.68454294 0.66327631 0.66120137 0.74719859]
mean value: 0.6985748896957277
key: test_accuracy
value: [0.76691729 0.7593985 0.79545455 0.77272727 0.83333333 0.81818182
0.81818182 0.78030303 0.82575758 0.88636364]
mean value: 0.8056618819776714
key: train_accuracy
value: [0.80824222 0.83347351 0.84453782 0.81428571 0.87815126 0.87478992
0.82605042 0.81764706 0.82184874 0.86890756]
mean value: 0.838793421489706
key: test_fscore
value: [0.72072072 0.78947368 0.76106195 0.8125 0.85333333 0.83098592
0.84210526 0.73873874 0.8 0.89361702]
mean value: 0.8042536623833423
key: train_fscore
value: [0.77470356 0.85547445 0.82917821 0.84134961 0.88315874 0.8814638
0.84901532 0.78704612 0.79886148 0.87850467]
mean value: 0.8378755963279767
key: test_precision
value: [0.88888889 0.70588235 0.91489362 0.69148936 0.76190476 0.77631579
0.74418605 0.91111111 0.93877551 0.84 ]
mean value: 0.8173447439758736
key: train_precision
value: [0.94004796 0.75515464 0.92008197 0.73433584 0.84829721 0.83685801
0.75 0.94575472 0.91721133 0.81857765]
mean value: 0.8466319322006147
key: test_recall
value: [0.60606061 0.89552239 0.65151515 0.98484848 0.96969697 0.89393939
0.96969697 0.62121212 0.6969697 0.95454545]
mean value: 0.824400723654455
key: train_recall
value: [0.65882353 0.98653199 0.75462185 0.98487395 0.9210084 0.93109244
0.97815126 0.67394958 0.70756303 0.94789916]
mean value: 0.8544515179809298
key: test_roc_auc
value: [0.76571687 0.75836725 0.79545455 0.77272727 0.83333333 0.81818182
0.81818182 0.78030303 0.82575758 0.88636364]
mean value: 0.8054387155133425
key: train_roc_auc
value: [0.80836799 0.83360213 0.84453782 0.81428571 0.87815126 0.87478992
0.82605042 0.81764706 0.82184874 0.86890756]
mean value: 0.8388188608776844
key: test_jcc
value: [0.56338028 0.65217391 0.61428571 0.68421053 0.74418605 0.71084337
0.72727273 0.58571429 0.66666667 0.80769231]
mean value: 0.6756425842686714
key: train_jcc
value: [0.63225806 0.74744898 0.70820189 0.72614622 0.79076479 0.78805121
0.73764259 0.64886731 0.66508689 0.78333333]
mean value: 0.7227801277927314
MCC on Blind test: 0.27
Accuracy on Blind test: 0.76
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.05123115 0.0788691 0.09172297 0.07682514 0.06595588 0.0708189
0.06753492 0.07790565 0.0734241 0.08075595]
mean value: 0.07350437641143799
key: score_time
value: [0.02121162 0.02126455 0.021348 0.02149343 0.0214448 0.02145123
0.02145767 0.02133083 0.02132106 0.02133274]
mean value: 0.021365594863891602
key: test_mcc
value: [0.69085775 0.65190379 0.55436714 0.72760688 0.71417356 0.44226898
0.75792383 0.57835174 0.45716359 0.75725927]
mean value: 0.6331876526299302
key: train_mcc
value: [0.77641553 0.75718089 0.55341157 0.76342639 0.75572242 0.57306029
0.77079477 0.54488981 0.50295754 0.72800226]
mean value: 0.6725861465309467
key: test_accuracy
value: [0.84210526 0.81203008 0.74242424 0.86363636 0.85606061 0.6969697
0.87878788 0.75757576 0.68181818 0.87121212]
mean value: 0.8002620186830713
key: train_accuracy
value: [0.88477712 0.87384357 0.74537815 0.87983193 0.87647059 0.75630252
0.88487395 0.73865546 0.71092437 0.85378151]
mean value: 0.820483917705013
key: test_fscore
value: [0.85106383 0.7826087 0.66 0.86153846 0.86131387 0.60784314
0.88059701 0.68627451 0.54347826 0.88275862]
mean value: 0.7617476399134425
key: train_fscore
value: [0.89204098 0.86288848 0.66885246 0.87356322 0.87093942 0.68614719
0.88779689 0.65559247 0.60277136 0.86917293]
mean value: 0.78697653961389
key: test_precision
value: [0.8 0.9375 0.97058824 0.875 0.83098592 0.86111111
0.86764706 0.97222222 0.96153846 0.81012658]
mean value: 0.8886719586760881
key: train_precision
value: [0.83976261 0.944 0.95625 0.92164179 0.91176471 0.96352584
0.8658147 0.96103896 0.96309963 0.78639456]
mean value: 0.9113292790413379
key: test_recall
value: [0.90909091 0.67164179 0.5 0.84848485 0.89393939 0.46969697
0.89393939 0.53030303 0.37878788 0.96969697]
mean value: 0.706558118498417
key: train_recall
value: [0.9512605 0.79461279 0.51428571 0.8302521 0.83361345 0.53277311
0.91092437 0.49747899 0.43865546 0.97142857]
mean value: 0.7275285063520358
key: test_roc_auc
value: [0.84260516 0.81309362 0.74242424 0.86363636 0.85606061 0.6969697
0.87878788 0.75757576 0.68181818 0.87121212]
mean value: 0.8004183627317956
key: train_roc_auc
value: [0.88472116 0.87377699 0.74537815 0.87983193 0.87647059 0.75630252
0.88487395 0.73865546 0.71092437 0.85378151]
mean value: 0.8204716634128398
key: test_jcc
value: [0.74074074 0.64285714 0.49253731 0.75675676 0.75641026 0.43661972
0.78666667 0.52238806 0.37313433 0.79012346]
mean value: 0.6298234440024083
key: train_jcc
value: [0.80512091 0.75884244 0.50246305 0.7755102 0.77138414 0.52224053
0.7982327 0.48764415 0.43140496 0.76861702]
mean value: 0.6621460103083406
MCC on Blind test: 0.7
Accuracy on Blind test: 0.88
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.46430111 0.63151526 0.62315559 0.49676085 0.48500848 0.48636103
0.56049109 0.52820277 0.47167945 0.49920678]
mean value: 0.5246682405471802
key: score_time
value: [0.02340293 0.02379346 0.040169 0.02477193 0.02459431 0.02452683
0.02468944 0.023803 0.02333736 0.02350903]
mean value: 0.025659728050231933
key: test_mcc
value: [0.89567796 0.80525437 0.84848485 0.86853519 0.89901011 0.82158384
0.89404202 0.77352678 0.9251987 0.833429 ]
mean value: 0.8564742825088401
key: train_mcc
value: [0.91641203 0.94997419 0.91334836 0.92034737 0.90140194 0.92680468
0.91322951 0.90448915 0.90191224 0.92034737]
mean value: 0.916826682329092
key: test_accuracy
value: [0.94736842 0.90225564 0.92424242 0.93181818 0.9469697 0.90909091
0.9469697 0.88636364 0.96212121 0.91666667]
mean value: 0.92738664843928
key: train_accuracy
value: [0.95794786 0.97476871 0.95630252 0.95966387 0.95042017 0.96302521
0.95630252 0.95210084 0.95042017 0.95966387]
mean value: 0.9580615728208861
key: test_fscore
value: [0.94814815 0.90510949 0.92424242 0.9352518 0.94964029 0.91304348
0.94736842 0.88888889 0.96296296 0.91729323]
mean value: 0.9291949132020663
key: train_fscore
value: [0.95867769 0.97512438 0.95716639 0.96059113 0.95127993 0.96375618
0.95709571 0.9526971 0.95159967 0.96059113]
mean value: 0.958857931089391
key: test_precision
value: [0.92753623 0.88571429 0.92424242 0.89041096 0.90410959 0.875
0.94029851 0.86956522 0.94202899 0.91044776]
mean value: 0.906935396134124
key: train_precision
value: [0.94308943 0.96078431 0.93861066 0.93900482 0.93506494 0.9450727
0.94003241 0.94098361 0.92948718 0.93900482]
mean value: 0.941113487171725
key: test_recall
value: [0.96969697 0.92537313 0.92424242 0.98484848 1. 0.95454545
0.95454545 0.90909091 0.98484848 0.92424242]
mean value: 0.9531433740388964
key: train_recall
value: [0.97478992 0.98989899 0.97647059 0.98319328 0.96806723 0.98319328
0.97478992 0.96470588 0.97478992 0.98319328]
mean value: 0.9773092267209914
key: test_roc_auc
value: [0.94753505 0.90208051 0.92424242 0.93181818 0.9469697 0.90909091
0.9469697 0.88636364 0.96212121 0.91666667]
mean value: 0.9273857982813207
key: train_roc_auc
value: [0.95793368 0.97478143 0.95630252 0.95966387 0.95042017 0.96302521
0.95630252 0.95210084 0.95042017 0.95966387]
mean value: 0.9580614265908384
key: test_jcc
value: [0.90140845 0.82666667 0.85915493 0.87837838 0.90410959 0.84
0.9 0.8 0.92857143 0.84722222]
mean value: 0.8685511665161482
key: train_jcc
value: [0.92063492 0.95145631 0.9178515 0.92417062 0.90708661 0.93004769
0.91772152 0.90966719 0.90766823 0.92417062]
mean value: 0.9210475218786636
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.26204967 0.27810788 0.28340149 0.26984215 0.28152084 0.25897765
0.30171061 0.28025818 0.28854394 0.30127692]
mean value: 0.28056893348693845
key: score_time
value: [0.02732182 0.02710104 0.03132987 0.02990603 0.03089166 0.0326097
0.03289962 0.02989483 0.0305655 0.03311324]
mean value: 0.03056333065032959
key: test_mcc
value: [0.92577528 0.74631701 0.91287093 0.91287093 0.89651574 0.86853519
0.93939394 0.95553309 0.93982555 0.8824419 ]
mean value: 0.8980079544979622
key: train_mcc
value: [0.9932941 0.99495514 0.99497063 0.99497063 0.99832074 0.99495939
0.99663866 0.99663866 0.99663866 0.99664429]
mean value: 0.995803087652558
key: test_accuracy
value: [0.96240602 0.87218045 0.95454545 0.95454545 0.9469697 0.93181818
0.96969697 0.97727273 0.96969697 0.93939394]
mean value: 0.9478525860104807
key: train_accuracy
value: [0.99663583 0.99747687 0.99747899 0.99747899 0.99915966 0.99747899
0.99831933 0.99831933 0.99831933 0.99831933]
mean value: 0.9978986649327519
key: test_fscore
value: [0.96296296 0.87769784 0.95652174 0.95652174 0.94890511 0.9352518
0.96969697 0.97777778 0.97014925 0.94202899]
mean value: 0.949751417771399
key: train_fscore
value: [0.99664992 0.99747262 0.99748533 0.99748533 0.99916037 0.99748111
0.99831933 0.99831933 0.99831933 0.99832215]
mean value: 0.9979014807088672
key: test_precision
value: [0.94202899 0.84722222 0.91666667 0.91666667 0.91549296 0.89041096
0.96969697 0.95652174 0.95588235 0.90277778]
mean value: 0.9213367297259749
key: train_precision
value: [0.9933222 0.99831366 0.99498328 0.99498328 0.99832215 0.9966443
0.99831933 0.99831933 0.99831933 0.99664992]
mean value: 0.9968176760610129
key: test_recall
value: [0.98484848 0.91044776 1. 1. 0.98484848 0.98484848
0.96969697 1. 0.98484848 0.98484848]
mean value: 0.9804387155133424
key: train_recall
value: [1. 0.996633 1. 1. 1. 0.99831933
0.99831933 0.99831933 0.99831933 1. ]
mean value: 0.9989910307557366
key: test_roc_auc
value: [0.9625735 0.87189055 0.95454545 0.95454545 0.9469697 0.93181818
0.96969697 0.97727273 0.96969697 0.93939394]
mean value: 0.9478403437358661
key: train_roc_auc
value: [0.996633 0.99747616 0.99747899 0.99747899 0.99915966 0.99747899
0.99831933 0.99831933 0.99831933 0.99831933]
mean value: 0.9978983108394873
key: test_jcc
value: [0.92857143 0.78205128 0.91666667 0.91666667 0.90277778 0.87837838
0.94117647 0.95652174 0.94202899 0.89041096]
mean value: 0.9055250354242226
key: train_jcc
value: [0.9933222 0.99495798 0.99498328 0.99498328 0.99832215 0.99497487
0.9966443 0.9966443 0.9966443 0.99664992]
mean value: 0.9958126566226824
MCC on Blind test: 0.79
Accuracy on Blind test: 0.92
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [1.2047646 1.20868587 1.12930322 1.05966353 1.07176399 1.15874434
1.15021348 1.16694856 1.16003871 1.15227175]
mean value: 1.1462398052215577
key: score_time
value: [0.0769465 0.0892241 0.07288265 0.08414268 0.07057261 0.07313824
0.07241249 0.07278395 0.07415533 0.07414198]
mean value: 0.07604005336761474
key: test_mcc
value: [0.80689237 0.77857748 0.69986771 0.83573501 0.83806027 0.72222273
0.85478752 0.85478752 0.84119102 0.78368849]
mean value: 0.8015810125001238
key: train_mcc
value: [0.95661128 0.96170619 0.95501173 0.95684323 0.96194386 0.96487079
0.96173716 0.95684323 0.96509988 0.95858042]
mean value: 0.9599247751275188
key: test_accuracy
value: [0.90225564 0.88721805 0.84848485 0.91666667 0.91666667 0.85606061
0.92424242 0.92424242 0.91666667 0.88636364]
mean value: 0.8978867623604465
key: train_accuracy
value: [0.97813288 0.98065601 0.97731092 0.97815126 0.98067227 0.98235294
0.98067227 0.97815126 0.98235294 0.9789916 ]
mean value: 0.9797444360418683
key: test_fscore
value: [0.90510949 0.89361702 0.85507246 0.91970803 0.92086331 0.86713287
0.92857143 0.92857143 0.92198582 0.8951049 ]
mean value: 0.9035736747628861
key: train_fscore
value: [0.97844113 0.98091286 0.97763049 0.9785124 0.98100743 0.98251457
0.98094449 0.9785124 0.98260149 0.9793559 ]
mean value: 0.9800433162018617
key: test_precision
value: [0.87323944 0.85135135 0.81944444 0.88732394 0.87671233 0.80519481
0.87837838 0.87837838 0.86666667 0.83116883]
mean value: 0.856785856463167
key: train_precision
value: [0.96563011 0.96726678 0.96405229 0.96260163 0.96428571 0.97359736
0.96732026 0.96260163 0.96895425 0.96266234]
mean value: 0.9658972351445866
key: test_recall
value: [0.93939394 0.94029851 0.89393939 0.95454545 0.96969697 0.93939394
0.98484848 0.98484848 0.98484848 0.96969697]
mean value: 0.9561510628674807
key: train_recall
value: [0.99159664 0.99494949 0.99159664 0.99495798 0.99831933 0.99159664
0.99495798 0.99495798 0.99663866 0.99663866]
mean value: 0.9946209999151175
key: test_roc_auc
value: [0.90253279 0.88681592 0.84848485 0.91666667 0.91666667 0.85606061
0.92424242 0.92424242 0.91666667 0.88636364]
mean value: 0.8978742650384441
key: train_roc_auc
value: [0.97812155 0.98066802 0.97731092 0.97815126 0.98067227 0.98235294
0.98067227 0.97815126 0.98235294 0.9789916 ]
mean value: 0.979744503862151
key: test_jcc
value: [0.82666667 0.80769231 0.74683544 0.85135135 0.85333333 0.7654321
0.86666667 0.86666667 0.85526316 0.81012658]
mean value: 0.8250034274353617
key: train_jcc
value: [0.95779221 0.96254072 0.95623987 0.9579288 0.96272285 0.96563011
0.96260163 0.9579288 0.96579805 0.95954693]
mean value: 0.9608729964186585
MCC on Blind test: 0.59
Accuracy on Blind test: 0.84
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [2.13605762 2.16598344 2.45498848 2.29848862 2.23179269 2.26086473
2.53592467 2.4032464 1.9804318 1.53801203]
mean value: 2.200579047203064
key: score_time
value: [0.01504397 0.01493168 0.01365137 0.02563953 0.01351357 0.01368713
0.0136621 0.01364279 0.0099721 0.01013899]
mean value: 0.014388322830200195
key: test_mcc
value: [0.88134139 0.85299767 0.92690611 0.89901011 0.92690611 0.88531564
0.9701425 0.91287093 0.95553309 0.8824419 ]
mean value: 0.909346543788278
key: train_mcc
value: [0.96662567 0.97665016 0.97666924 0.97009871 0.968375 0.9684626
0.97337873 0.97337873 0.97173741 0.97666924]
mean value: 0.9722045482625817
key: test_accuracy
value: [0.93984962 0.92481203 0.96212121 0.9469697 0.96212121 0.93939394
0.98484848 0.95454545 0.97727273 0.93939394]
mean value: 0.9531328320802005
key: train_accuracy
value: [0.98317914 0.9882254 0.98823529 0.98487395 0.98403361 0.98403361
0.98655462 0.98655462 0.98571429 0.98823529]
mean value: 0.985963983574927
key: test_fscore
value: [0.94117647 0.92857143 0.96350365 0.94964029 0.96350365 0.94285714
0.98507463 0.95652174 0.97777778 0.94202899]
mean value: 0.9550655758337795
key: train_fscore
value: [0.9833887 0.98833333 0.98835275 0.98507463 0.98423237 0.98425849
0.98671096 0.98671096 0.98589212 0.98835275]
mean value: 0.9861307055733873
key: test_precision
value: [0.91428571 0.89041096 0.92957746 0.90410959 0.92957746 0.89189189
0.97058824 0.91666667 0.95652174 0.90277778]
mean value: 0.9206407502569274
key: train_precision
value: [0.97208539 0.97854785 0.9785832 0.97217676 0.97213115 0.97058824
0.97536946 0.97536946 0.97377049 0.9785832 ]
mean value: 0.9747205183061565
key: test_recall
value: [0.96969697 0.97014925 1. 1. 1. 1.
1. 1. 1. 0.98484848]
mean value: 0.9924694708276798
key: train_recall
value: [0.99495798 0.9983165 0.99831933 0.99831933 0.99663866 0.99831933
0.99831933 0.99831933 0.99831933 0.99831933]
mean value: 0.9978148431089607
key: test_roc_auc
value: [0.94007237 0.92446857 0.96212121 0.9469697 0.96212121 0.93939394
0.98484848 0.95454545 0.97727273 0.93939394]
mean value: 0.9531207598371777
key: train_roc_auc
value: [0.98316923 0.98823388 0.98823529 0.98487395 0.98403361 0.98403361
0.98655462 0.98655462 0.98571429 0.98823529]
mean value: 0.9859638400814871
key: test_jcc
value: [0.88888889 0.86666667 0.92957746 0.90410959 0.92957746 0.89189189
0.97058824 0.91666667 0.95652174 0.89041096]
mean value: 0.9144899566061336
key: train_jcc
value: [0.96732026 0.97693575 0.97697368 0.97058824 0.96895425 0.96900489
0.97377049 0.97377049 0.97217676 0.97697368]
mean value: 0.9726468500088701
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.04838514 0.05437803 0.05212498 0.05181313 0.09965992 0.10766077
0.0892477 0.06327462 0.06029487 0.05176997]
mean value: 0.06786091327667236
key: score_time
value: [0.01332021 0.01374054 0.01412916 0.01401949 0.02348638 0.0190959
0.0138495 0.01379108 0.01380205 0.02299404]
mean value: 0.016222834587097168
key: test_mcc
value: [0.31255936 0.27144125 0.1767767 0.28629917 0.15027827 0.23664319
0.28629917 0.21821789 0.21038958 0.19682713]
mean value: 0.2345731703657738
key: train_mcc
value: [0.25968885 0.25367309 0.2611946 0.25377296 0.26846242 0.29285905
0.25750387 0.27025687 0.25935415 0.25564355]
mean value: 0.26324094119122415
key: test_accuracy
value: [0.58646617 0.57142857 0.53030303 0.57575758 0.53787879 0.5530303
0.57575758 0.54545455 0.5530303 0.56060606]
mean value: 0.5589712918660287
key: train_accuracy
value: [0.56349874 0.56013457 0.56386555 0.5605042 0.56722689 0.5789916
0.56218487 0.56806723 0.56302521 0.56134454]
mean value: 0.5648843389332183
key: test_fscore
value: [0.70588235 0.70157068 0.68041237 0.70212766 0.67724868 0.69109948
0.70212766 0.6875 0.68783069 0.68478261]
mean value: 0.6920582174067214
key: train_fscore
value: [0.69631363 0.6943308 0.69631363 0.69468768 0.69794721 0.70372561
0.69549971 0.69835681 0.69590643 0.69509346]
mean value: 0.6968174976741559
key: test_precision
value: [0.54545455 0.54032258 0.515625 0.54098361 0.5203252 0.528
0.54098361 0.52380952 0.52845528 0.53389831]
mean value: 0.5317857655913608
key: train_precision
value: [0.53411131 0.53178156 0.53411131 0.53220036 0.53603604 0.54288321
0.53315412 0.53651939 0.53363229 0.53267681]
mean value: 0.5347106393011474
key: test_recall
value: [1. 1. 1. 1. 0.96969697 1.
1. 1. 0.98484848 0.95454545]
mean value: 0.990909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.58955224 0.56818182 0.53030303 0.57575758 0.53787879 0.5530303
0.57575758 0.54545455 0.5530303 0.56060606]
mean value: 0.558955223880597
key: train_roc_auc
value: [0.56313131 0.5605042 0.56386555 0.5605042 0.56722689 0.5789916
0.56218487 0.56806723 0.56302521 0.56134454]
mean value: 0.5648845598845599
key: test_jcc
value: [0.54545455 0.54032258 0.515625 0.54098361 0.512 0.528
0.54098361 0.52380952 0.52419355 0.52066116]
mean value: 0.5292033568435874
key: train_jcc
value: [0.53411131 0.53178156 0.53411131 0.53220036 0.53603604 0.54288321
0.53315412 0.53651939 0.53363229 0.53267681]
mean value: 0.5347106393011474
MCC on Blind test: 0.11
Accuracy on Blind test: 0.38
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.06216955 0.03932023 0.06577659 0.04073477 0.04673862 0.02940083
0.03316712 0.05854964 0.04999566 0.0483737 ]
mean value: 0.0474226713180542
key: score_time
value: [0.02970314 0.0334301 0.03142047 0.02710342 0.0294013 0.01831245
0.020015 0.02023387 0.01992059 0.02014923]
mean value: 0.024968957901000975
key: test_mcc
value: [0.6695318 0.62406697 0.69986771 0.80758535 0.76237471 0.69825325
0.77495429 0.75792383 0.81818182 0.73029674]
mean value: 0.7343036471383053
key: train_mcc
value: [0.77850192 0.79341585 0.77335768 0.77317772 0.7797431 0.77672743
0.77477537 0.77870084 0.76803819 0.77766758]
mean value: 0.7774105692657516
key: test_accuracy
value: [0.83458647 0.81203008 0.84848485 0.90151515 0.87121212 0.84848485
0.88636364 0.87878788 0.90909091 0.86363636]
mean value: 0.8654192298929141
key: train_accuracy
value: [0.8881413 0.89571068 0.88571429 0.88571429 0.88907563 0.88739496
0.88655462 0.88823529 0.88319328 0.88823529]
mean value: 0.8877969623509623
key: test_fscore
value: [0.8358209 0.81481481 0.85507246 0.90647482 0.88435374 0.85294118
0.89051095 0.88059701 0.90909091 0.86956522]
mean value: 0.8699242002529087
key: train_fscore
value: [0.89230769 0.89918699 0.88961039 0.88943089 0.89250814 0.89123377
0.8901546 0.89230769 0.88689992 0.89125102]
mean value: 0.8914891107904296
key: test_precision
value: [0.82352941 0.80882353 0.81944444 0.8630137 0.80246914 0.82857143
0.85915493 0.86764706 0.90909091 0.83333333]
mean value: 0.8415077879450187
key: train_precision
value: [0.8609375 0.86949686 0.86028257 0.86141732 0.8657188 0.86185243
0.86277603 0.8609375 0.85962145 0.86783439]
mean value: 0.8630874856643093
key: test_recall
value: [0.84848485 0.82089552 0.89393939 0.95454545 0.98484848 0.87878788
0.92424242 0.89393939 0.90909091 0.90909091]
mean value: 0.9017865219357757
key: train_recall
value: [0.92605042 0.93097643 0.9210084 0.91932773 0.9210084 0.92268908
0.91932773 0.92605042 0.91596639 0.91596639]
mean value: 0.9218371388959624
key: test_roc_auc
value: [0.83469019 0.81196291 0.84848485 0.90151515 0.87121212 0.84848485
0.88636364 0.87878788 0.90909091 0.86363636]
mean value: 0.8654228855721393
key: train_roc_auc
value: [0.88810939 0.89574032 0.88571429 0.88571429 0.88907563 0.88739496
0.88655462 0.88823529 0.88319328 0.88823529]
mean value: 0.8877967348555583
key: test_jcc
value: [0.71794872 0.6875 0.74683544 0.82894737 0.79268293 0.74358974
0.80263158 0.78666667 0.83333333 0.76923077]
mean value: 0.7709366548004895
key: train_jcc
value: [0.80555556 0.816839 0.80116959 0.80087848 0.80588235 0.80380673
0.80205279 0.80555556 0.79678363 0.80383481]
mean value: 0.8042358482477265
MCC on Blind test: 0.65
Accuracy on Blind test: 0.86
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.45398355 0.64111161 0.66144133 0.64707065 0.74668407 0.61079025
0.59106517 0.64666867 0.64362574 0.66236663]
mean value: 0.6304807662963867
key: score_time
value: [0.02003956 0.02357125 0.02011156 0.01998615 0.02011824 0.02004075
0.02002525 0.02092671 0.02399397 0.02006865]
mean value: 0.020888209342956543
key: test_mcc
value: [0.6695318 0.62406697 0.68568568 0.80758535 0.76237471 0.69825325
0.77495429 0.78824078 0.80386117 0.73029674]
mean value: 0.7344850738674684
key: train_mcc
value: [0.77850192 0.79341585 0.78824973 0.77317772 0.78279168 0.77672743
0.77477537 0.78505932 0.78455181 0.77766758]
mean value: 0.7814918416441247
key: test_accuracy
value: [0.83458647 0.81203008 0.84090909 0.90151515 0.87121212 0.84848485
0.88636364 0.89393939 0.90151515 0.86363636]
mean value: 0.8654192298929141
key: train_accuracy
value: [0.8881413 0.89571068 0.89327731 0.88571429 0.8907563 0.88739496
0.88655462 0.89159664 0.89159664 0.88823529]
mean value: 0.8898978026870967
key: test_fscore
value: [0.8358209 0.81481481 0.84892086 0.90647482 0.88435374 0.85294118
0.89051095 0.89552239 0.9037037 0.86956522]
mean value: 0.8702628569817447
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:156: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:159: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.89230769 0.89918699 0.89666395 0.88943089 0.89379085 0.89123377
0.8901546 0.89520715 0.89469388 0.89125102]
mean value: 0.8933920794349053
key: test_precision
value: [0.82352941 0.80882353 0.80821918 0.8630137 0.80246914 0.82857143
0.85915493 0.88235294 0.88405797 0.83333333]
mean value: 0.8393525557364458
key: train_precision
value: [0.8609375 0.86949686 0.86908517 0.86141732 0.86963434 0.86185243
0.86277603 0.8663522 0.86984127 0.86783439]
mean value: 0.8659227516425898
key: test_recall
value: [0.84848485 0.82089552 0.89393939 0.95454545 0.98484848 0.87878788
0.92424242 0.90909091 0.92424242 0.90909091]
mean value: 0.9048168249660787
key: train_recall
value: [0.92605042 0.93097643 0.92605042 0.91932773 0.91932773 0.92268908
0.91932773 0.92605042 0.9210084 0.91596639]
mean value: 0.9226774750304162
key: test_roc_auc
value: [0.83469019 0.81196291 0.84090909 0.90151515 0.87121212 0.84848485
0.88636364 0.89393939 0.90151515 0.86363636]
mean value: 0.8654228855721393
key: train_roc_auc
value: [0.88810939 0.89574032 0.89327731 0.88571429 0.8907563 0.88739496
0.88655462 0.89159664 0.89159664 0.88823529]
mean value: 0.8898975751916928
key: test_jcc
value: [0.71794872 0.6875 0.7375 0.82894737 0.79268293 0.74358974
0.80263158 0.81081081 0.82432432 0.76923077]
mean value: 0.7715166240102056
key: train_jcc
value: [0.80555556 0.816839 0.81268437 0.80087848 0.80797637 0.80380673
0.80205279 0.81029412 0.80945347 0.80383481]
mean value: 0.8073375678553497
MCC on Blind test: 0.65
Accuracy on Blind test: 0.86
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.03648067 0.06293559 0.11051536 0.10283589 0.07630587 0.05248928
0.04713821 0.03610635 0.04521561 0.04661322]
mean value: 0.06166360378265381
key: score_time
value: [0.0122385 0.0123806 0.02309442 0.0224843 0.01422787 0.01225924
0.01240015 0.01217985 0.0121851 0.01242042]
mean value: 0.014587044715881348
key: test_mcc
value: [0.79666667 0.60104076 0.51854497 0.63333333 0.63819901 0.71889189
0.42833333 0.6363961 0.62994079 0.71393289]
mean value: 0.6315279761986434
key: train_mcc
value: [0.72854342 0.73336204 0.78336839 0.72770459 0.77058957 0.73776759
0.74378449 0.73776759 0.76518728 0.72920376]
mean value: 0.7457278720621128
key: test_accuracy
value: [0.89795918 0.79591837 0.75510204 0.81632653 0.81632653 0.85714286
0.71428571 0.81632653 0.8125 0.85416667]
mean value: 0.8136054421768707
key: train_accuracy
value: [0.86332574 0.86560364 0.89066059 0.86332574 0.88382688 0.86788155
0.87015945 0.86788155 0.88181818 0.86363636]
mean value: 0.8718119693518327
key: test_fscore
value: [0.89795918 0.80769231 0.76923077 0.81632653 0.80851064 0.86792453
0.72 0.83018868 0.82352941 0.8627451 ]
mean value: 0.8204107146857755
key: train_fscore
value: [0.86842105 0.87089716 0.89473684 0.86725664 0.88840263 0.8722467
0.87581699 0.8722467 0.88546256 0.86842105]
mean value: 0.8763908306318798
key: test_precision
value: [0.88 0.75 0.71428571 0.8 0.86363636 0.82142857
0.72 0.78571429 0.77777778 0.81481481]
mean value: 0.7927657527657528
key: train_precision
value: [0.83898305 0.83966245 0.86440678 0.84482759 0.85294118 0.84255319
0.8375 0.84255319 0.85897436 0.83898305]
mean value: 0.8461384833243883
key: test_recall
value: [0.91666667 0.875 0.83333333 0.83333333 0.76 0.92
0.72 0.88 0.875 0.91666667]
mean value: 0.853
key: train_recall
value: [0.9 0.90454545 0.92727273 0.89090909 0.92694064 0.90410959
0.91780822 0.90410959 0.91363636 0.9 ]
mean value: 0.9089331672893317
key: test_roc_auc
value: [0.89833333 0.7975 0.75666667 0.81666667 0.8175 0.85583333
0.71416667 0.815 0.8125 0.85416667]
mean value: 0.8138333333333333
key: train_roc_auc
value: [0.86324201 0.86551474 0.890577 0.86326276 0.88392487 0.86796389
0.87026775 0.86796389 0.88181818 0.86363636]
mean value: 0.8718171440431715
key: test_jcc
value: [0.81481481 0.67741935 0.625 0.68965517 0.67857143 0.76666667
0.5625 0.70967742 0.7 0.75862069]
mean value: 0.6982925546315424
key: train_jcc
value: [0.76744186 0.77131783 0.80952381 0.765625 0.7992126 0.7734375
0.77906977 0.7734375 0.7944664 0.76744186]
mean value: 0.7800974128940519
MCC on Blind test: 0.7
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [0.82624602 0.93945646 0.98582363 0.9221282 1.18374133 1.19921923
0.99356842 1.23624206 1.00866294 1.01761222]
mean value: 1.0312700510025024
key: score_time
value: [0.01234579 0.0122726 0.01226354 0.01245856 0.02000523 0.01238537
0.01224136 0.01265335 0.0125165 0.0130868 ]
mean value: 0.013222908973693848
key: test_mcc
value: [0.755 0.51854497 0.48142509 0.67333333 0.55166667 0.71889189
0.42833333 0.6363961 0.62994079 0.71393289]
mean value: 0.6107465072859045
key: train_mcc
value: [0.69679909 0.71561303 0.74250278 0.7006985 0.80039507 0.70658033
0.72615613 0.69802628 0.71631274 0.72543774]
mean value: 0.7228521690037979
key: test_accuracy
value: [0.87755102 0.75510204 0.73469388 0.83673469 0.7755102 0.85714286
0.71428571 0.81632653 0.8125 0.85416667]
mean value: 0.8034013605442176
key: train_accuracy
value: [0.84738041 0.85649203 0.87015945 0.84965831 0.89977221 0.85193622
0.86104784 0.84738041 0.85681818 0.86136364]
mean value: 0.8602008697452889
key: test_fscore
value: [0.875 0.76923077 0.75471698 0.83333333 0.7755102 0.86792453
0.72 0.83018868 0.82352941 0.8627451 ]
mean value: 0.8112179005128902
key: train_fscore
value: [0.85339168 0.8627451 0.87527352 0.85462555 0.90178571 0.85776805
0.86767896 0.8540305 0.8627451 0.8671024 ]
mean value: 0.8657146577807547
key: test_precision
value: [0.875 0.71428571 0.68965517 0.83333333 0.79166667 0.82142857
0.72 0.78571429 0.77777778 0.81481481]
mean value: 0.7823676336434957
key: train_precision
value: [0.82278481 0.82845188 0.84388186 0.82905983 0.88209607 0.82352941
0.82644628 0.81666667 0.82845188 0.83263598]
mean value: 0.8334004673972575
key: test_recall
value: [0.875 0.83333333 0.83333333 0.83333333 0.76 0.92
0.72 0.88 0.875 0.91666667]
mean value: 0.8446666666666667
key: train_recall
value: [0.88636364 0.9 0.90909091 0.88181818 0.92237443 0.89497717
0.91324201 0.89497717 0.9 0.90454545]
mean value: 0.900738895807389
key: test_roc_auc
value: [0.8775 0.75666667 0.73666667 0.83666667 0.77583333 0.85583333
0.71416667 0.815 0.8125 0.85416667]
mean value: 0.8035
key: train_roc_auc
value: [0.84729141 0.85639269 0.87007057 0.84958489 0.89982358 0.85203404
0.86116646 0.84748858 0.85681818 0.86136364]
mean value: 0.860203403902034
key: test_jcc
value: [0.77777778 0.625 0.60606061 0.71428571 0.63333333 0.76666667
0.5625 0.70967742 0.7 0.75862069]
mean value: 0.6853922207134109
key: train_jcc
value: [0.74427481 0.75862069 0.77821012 0.74615385 0.82113821 0.75095785
0.76628352 0.74524715 0.75862069 0.76538462]
mean value: 0.7634891505722061
MCC on Blind test: 0.67
Accuracy on Blind test: 0.86
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.0174191 0.01351547 0.01054788 0.01021028 0.01021981 0.01003218
0.01009798 0.01021171 0.00998211 0.01007557]
mean value: 0.011231207847595214
key: score_time
value: [0.01329041 0.01151371 0.00949287 0.00919437 0.00917673 0.00907946
0.00908232 0.00900674 0.0090127 0.00906205]
mean value: 0.009791135787963867
key: test_mcc
value: [0.55091896 0.48659733 0.39836231 0.40083017 0.35355339 0.55166667
0.2710576 0.55166667 0.37796447 0.54213748]
mean value: 0.4484755049286746
key: train_mcc
value: [0.49626427 0.53329675 0.49606557 0.50801433 0.57619616 0.48932809
0.55172318 0.47215524 0.53099395 0.49437368]
mean value: 0.5148411209141024
key: test_accuracy
value: [0.7755102 0.73469388 0.69387755 0.69387755 0.67346939 0.7755102
0.63265306 0.7755102 0.6875 0.77083333]
mean value: 0.7213435374149659
key: train_accuracy
value: [0.74715262 0.76537585 0.74715262 0.75170843 0.78359909 0.74259681
0.77220957 0.73348519 0.76363636 0.74545455]
mean value: 0.7552371091323256
key: test_fscore
value: [0.76595745 0.68292683 0.71698113 0.63414634 0.65217391 0.7755102
0.60869565 0.7755102 0.66666667 0.76595745]
mean value: 0.7044525836471524
key: train_fscore
value: [0.73634204 0.75417661 0.75816993 0.73479319 0.76190476 0.72371638
0.75124378 0.71111111 0.74879227 0.7294686 ]
mean value: 0.740971868081603
key: test_precision
value: [0.7826087 0.82352941 0.65517241 0.76470588 0.71428571 0.79166667
0.66666667 0.79166667 0.71428571 0.7826087 ]
mean value: 0.7487196527786527
key: train_precision
value: [0.77114428 0.79396985 0.72803347 0.79057592 0.84444444 0.77894737
0.82513661 0.77419355 0.79896907 0.77835052]
mean value: 0.7883765077790228
key: test_recall
value: [0.75 0.58333333 0.79166667 0.54166667 0.6 0.76
0.56 0.76 0.625 0.75 ]
mean value: 0.6721666666666667
key: train_recall
value: [0.70454545 0.71818182 0.79090909 0.68636364 0.69406393 0.67579909
0.68949772 0.65753425 0.70454545 0.68636364]
mean value: 0.7007804068078041
key: test_roc_auc
value: [0.775 0.73166667 0.69583333 0.69083333 0.675 0.77583333
0.63416667 0.77583333 0.6875 0.77083333]
mean value: 0.7212500000000001
key: train_roc_auc
value: [0.7472499 0.7654836 0.74705272 0.75185762 0.7833956 0.742445
0.77202159 0.73331258 0.76363636 0.74545455]
mean value: 0.7551909506019095
key: test_jcc
value: [0.62068966 0.51851852 0.55882353 0.46428571 0.48387097 0.63333333
0.4375 0.63333333 0.5 0.62068966]
mean value: 0.5471044706969427
key: train_jcc
value: [0.58270677 0.60536398 0.61052632 0.58076923 0.61538462 0.56704981
0.60159363 0.55172414 0.5984556 0.57414449]
mean value: 0.5887718570540718
MCC on Blind test: 0.45
Accuracy on Blind test: 0.78
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.01025057 0.01029849 0.01029444 0.01029325 0.01027894 0.01024628
0.01033664 0.01036143 0.01040602 0.01041937]
mean value: 0.010318541526794433
key: score_time
value: [0.00901389 0.00901246 0.00904012 0.009022 0.00901365 0.00907755
0.0090878 0.00918317 0.00921178 0.00922823]
mean value: 0.009089064598083497
key: test_mcc
value: [0.715 0.38833333 0.39196475 0.51252158 0.34666667 0.59839104
0.63333333 0.55166667 0.58536941 0.58333333]
mean value: 0.5306580112563036
key: train_mcc
value: [0.59080781 0.61303798 0.60465079 0.62803099 0.60397855 0.62220982
0.59975096 0.59009009 0.5917901 0.56832862]
mean value: 0.6012675717929086
key: test_accuracy
value: [0.85714286 0.69387755 0.69387755 0.75510204 0.67346939 0.79591837
0.81632653 0.7755102 0.79166667 0.79166667]
mean value: 0.7644557823129251
key: train_accuracy
value: [0.79498861 0.80637813 0.80182232 0.81321185 0.80182232 0.81093394
0.79954442 0.79498861 0.79545455 0.78409091]
mean value: 0.8003235659556844
key: test_fscore
value: [0.85714286 0.69387755 0.70588235 0.76 0.68 0.81481481
0.81632653 0.7755102 0.8 0.79166667]
mean value: 0.7695220977279801
key: train_fscore
value: [0.80088496 0.8098434 0.80794702 0.82017544 0.80449438 0.81348315
0.80357143 0.79638009 0.80088496 0.7816092 ]
mean value: 0.8039274012977246
key: test_precision
value: [0.84 0.68 0.66666667 0.73076923 0.68 0.75862069
0.83333333 0.79166667 0.76923077 0.79166667]
mean value: 0.7541954022988506
key: train_precision
value: [0.78017241 0.79735683 0.78540773 0.79237288 0.7920354 0.80088496
0.7860262 0.78923767 0.78017241 0.79069767]
mean value: 0.7894364159893563
key: test_recall
value: [0.875 0.70833333 0.75 0.79166667 0.68 0.88
0.8 0.76 0.83333333 0.79166667]
mean value: 0.787
key: train_recall
value: [0.82272727 0.82272727 0.83181818 0.85 0.8173516 0.82648402
0.82191781 0.80365297 0.82272727 0.77272727]
mean value: 0.8192133665421336
key: test_roc_auc
value: [0.8575 0.69416667 0.695 0.75583333 0.67333333 0.79416667
0.81666667 0.77583333 0.79166667 0.79166667]
mean value: 0.7645833333333334
key: train_roc_auc
value: [0.79492528 0.80634081 0.80175384 0.81312785 0.80185762 0.81096928
0.79959527 0.7950083 0.79545455 0.78409091]
mean value: 0.8003123702781237
key: test_jcc
value: [0.75 0.53125 0.54545455 0.61290323 0.51515152 0.6875
0.68965517 0.63333333 0.66666667 0.65517241]
mean value: 0.6287086872619408
key: train_jcc
value: [0.66789668 0.68045113 0.67777778 0.69516729 0.67293233 0.68560606
0.67164179 0.66165414 0.66789668 0.64150943]
mean value: 0.6722533301554774
MCC on Blind test: 0.61
Accuracy on Blind test: 0.84
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.01000905 0.01069713 0.01083493 0.01071715 0.0106926 0.01064563
0.01084876 0.01088238 0.01057363 0.01007628]
mean value: 0.010597753524780273
key: score_time
value: [0.0182538 0.01808405 0.01793838 0.01800442 0.01830792 0.01789689
0.01869392 0.01868653 0.01774597 0.01882052]
mean value: 0.018243241310119628
key: test_mcc
value: [0.14333333 0.34666667 0.34891534 0.05892557 0.22370649 0.47
0.39196475 0.38731273 0.46195658 0.58536941]
mean value: 0.34181508579127046
key: train_mcc
value: [0.58110664 0.58999108 0.54898298 0.56719733 0.59933565 0.58999108
0.56745316 0.59908676 0.54581553 0.54095939]
mean value: 0.5729919588657677
key: test_accuracy
value: [0.57142857 0.67346939 0.67346939 0.53061224 0.6122449 0.73469388
0.69387755 0.69387755 0.72916667 0.79166667]
mean value: 0.6704506802721089
key: train_accuracy
value: [0.7904328 0.79498861 0.77448747 0.78359909 0.79954442 0.79498861
0.78359909 0.79954442 0.77272727 0.77045455]
mean value: 0.7864366328432387
key: test_fscore
value: [0.57142857 0.66666667 0.68 0.48888889 0.62745098 0.73469388
0.68085106 0.70588235 0.74509804 0.7826087 ]
mean value: 0.6683569136566129
key: train_fscore
value: [0.78801843 0.79638009 0.77448747 0.7845805 0.8018018 0.79357798
0.77958237 0.79908676 0.77678571 0.76887872]
mean value: 0.7863179834924426
key: test_precision
value: [0.56 0.66666667 0.65384615 0.52380952 0.61538462 0.75
0.72727273 0.69230769 0.7037037 0.81818182]
mean value: 0.6711172901172902
key: train_precision
value: [0.79906542 0.79279279 0.77625571 0.78280543 0.79111111 0.79723502
0.79245283 0.79908676 0.76315789 0.77419355]
mean value: 0.7868156516436422
key: test_recall
value: [0.58333333 0.66666667 0.70833333 0.45833333 0.64 0.72
0.64 0.72 0.79166667 0.75 ]
mean value: 0.6678333333333333
key: train_recall
value: [0.77727273 0.8 0.77272727 0.78636364 0.81278539 0.78995434
0.76712329 0.79908676 0.79090909 0.76363636]
mean value: 0.7859858862598589
key: test_roc_auc
value: [0.57166667 0.67333333 0.67416667 0.52916667 0.61166667 0.735
0.695 0.69333333 0.72916667 0.79166667]
mean value: 0.6704166666666667
key: train_roc_auc
value: [0.79046285 0.79497717 0.77449149 0.78359278 0.79957451 0.79497717
0.78356164 0.79954338 0.77272727 0.77045455]
mean value: 0.7864362806143628
key: test_jcc
value: [0.4 0.5 0.51515152 0.32352941 0.45714286 0.58064516
0.51612903 0.54545455 0.59375 0.64285714]
mean value: 0.5074659665919153
key: train_jcc
value: [0.65019011 0.66165414 0.63197026 0.64552239 0.66917293 0.65779468
0.63878327 0.66539924 0.6350365 0.62453532]
mean value: 0.6480058828667646
MCC on Blind test: 0.4
Accuracy on Blind test: 0.73
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.02519631 0.02429771 0.02343082 0.02211881 0.02026248 0.02048516
0.02434134 0.02253342 0.02340245 0.02184677]
mean value: 0.022791528701782228
key: score_time
value: [0.01306438 0.01318479 0.0132587 0.01345491 0.01253915 0.01197505
0.0122962 0.01304913 0.01205301 0.01192784]
mean value: 0.012680315971374511
key: test_mcc
value: [0.83973406 0.49255205 0.55612092 0.7145252 0.59166667 0.74389879
0.51089422 0.55390031 0.59160798 0.72422435]
mean value: 0.6319124563642237
key: train_mcc
value: [0.68661164 0.72255181 0.7150502 0.70081451 0.70386925 0.69938716
0.72442006 0.66799508 0.69328082 0.68243161]
mean value: 0.69964121420511
key: test_accuracy
value: [0.91836735 0.73469388 0.7755102 0.85714286 0.79591837 0.85714286
0.75510204 0.7755102 0.79166667 0.85416667]
mean value: 0.8115221088435374
key: train_accuracy
value: [0.8405467 0.85876993 0.85421412 0.84738041 0.84738041 0.84738041
0.85876993 0.82915718 0.84318182 0.83863636]
mean value: 0.8465417270656451
key: test_fscore
value: [0.92 0.76363636 0.78431373 0.85106383 0.8 0.87719298
0.76923077 0.79245283 0.80769231 0.86792453]
mean value: 0.8233507336783576
key: train_fscore
value: [0.85042735 0.86695279 0.86382979 0.85714286 0.85835095 0.85529158
0.86752137 0.84210526 0.85350318 0.84796574]
mean value: 0.8563090866702563
key: test_precision
value: [0.88461538 0.67741935 0.74074074 0.86956522 0.8 0.78125
0.74074074 0.75 0.75 0.79310345]
mean value: 0.7787434886602742
key: train_precision
value: [0.80241935 0.82113821 0.812 0.80722892 0.7992126 0.81147541
0.81526104 0.78125 0.80079681 0.80161943]
mean value: 0.8052401780268827
key: test_recall
value: [0.95833333 0.875 0.83333333 0.83333333 0.8 1.
0.8 0.84 0.875 0.95833333]
mean value: 0.8773333333333333
key: train_recall
value: [0.90454545 0.91818182 0.92272727 0.91363636 0.92694064 0.90410959
0.92694064 0.91324201 0.91363636 0.9 ]
mean value: 0.9143960149439602
key: test_roc_auc
value: [0.91916667 0.7375 0.77666667 0.85666667 0.79583333 0.85416667
0.75416667 0.77416667 0.79166667 0.85416667]
mean value: 0.8114166666666667
key: train_roc_auc
value: [0.84040058 0.85863429 0.8540577 0.84722914 0.84756123 0.84750934
0.85892487 0.82934828 0.84318182 0.83863636]
mean value: 0.8465483603154836
key: test_jcc
value: [0.85185185 0.61764706 0.64516129 0.74074074 0.66666667 0.78125
0.625 0.65625 0.67741935 0.76666667]
mean value: 0.7028653629910746
key: train_jcc
value: [0.73977695 0.76515152 0.76029963 0.75 0.75185185 0.74716981
0.76603774 0.72727273 0.74444444 0.73605948]
mean value: 0.7488064142585281
MCC on Blind test: 0.68
Accuracy on Blind test: 0.86
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [0.61014509 2.33109617 2.37079334 0.93144178 2.48047662 2.4058454
2.36077499 2.46738005 2.3683176 1.95464587]
mean value: 2.0280916929244994
key: score_time
value: [0.01267934 0.01379967 0.02186561 0.01337576 0.01495171 0.01306129
0.01289058 0.01424766 0.01542616 0.01264715]
mean value: 0.014494490623474122
key: test_mcc
value: [0.79666667 0.51 0.43071846 0.6750504 0.55166667 0.7145252
0.46911585 0.55390031 0.54594868 0.50709255]
mean value: 0.5754684792348235
key: train_mcc
value: [0.7180484 0.95013496 0.96355334 0.76321363 0.95948845 0.93167289
0.96371487 0.95013496 0.96367619 0.83223957]
mean value: 0.8995877269556399
key: test_accuracy
value: [0.89795918 0.75510204 0.71428571 0.83673469 0.7755102 0.85714286
0.73469388 0.7755102 0.77083333 0.75 ]
mean value: 0.7867772108843537
key: train_accuracy
value: [0.85421412 0.97494305 0.98177677 0.88154897 0.97949886 0.96583144
0.98177677 0.97494305 0.98181818 0.91590909]
mean value: 0.9492260302340029
key: test_fscore
value: [0.89795918 0.75 0.72 0.82608696 0.7755102 0.8627451
0.74509804 0.79245283 0.78431373 0.76923077]
mean value: 0.7923396806441387
key: train_fscore
value: [0.86554622 0.97471264 0.98181818 0.88288288 0.97977528 0.96583144
0.98190045 0.9751693 0.98190045 0.91722595]
mean value: 0.9506762798831331
key: test_precision
value: [0.88 0.75 0.69230769 0.86363636 0.79166667 0.84615385
0.73076923 0.75 0.74074074 0.71428571]
mean value: 0.7759560254560255
key: train_precision
value: [0.8046875 0.98604651 0.98181818 0.875 0.96460177 0.96363636
0.97309417 0.96428571 0.97747748 0.9030837 ]
mean value: 0.9393731389601265
key: test_recall
value: [0.91666667 0.75 0.75 0.79166667 0.76 0.88
0.76 0.84 0.83333333 0.83333333]
mean value: 0.8115
key: train_recall
value: [0.93636364 0.96363636 0.98181818 0.89090909 0.99543379 0.96803653
0.99086758 0.98630137 0.98636364 0.93181818]
mean value: 0.9631548360315483
key: test_roc_auc
value: [0.89833333 0.755 0.715 0.83583333 0.77583333 0.85666667
0.73416667 0.77416667 0.77083333 0.75 ]
mean value: 0.7865833333333333
key: train_roc_auc
value: [0.85402657 0.97496887 0.98177667 0.8815276 0.97953508 0.96583645
0.98179743 0.97496887 0.98181818 0.91590909]
mean value: 0.9492164798671648
key: test_jcc
value: [0.81481481 0.6 0.5625 0.7037037 0.63333333 0.75862069
0.59375 0.65625 0.64516129 0.625 ]
mean value: 0.6593133831829605
key: train_jcc
value: [0.76296296 0.95067265 0.96428571 0.79032258 0.96035242 0.9339207
0.96444444 0.95154185 0.96444444 0.84710744]
mean value: 0.9090055208512735
MCC on Blind test: 0.68
Accuracy on Blind test: 0.86
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.03068066 0.02850795 0.02653122 0.02450705 0.02715755 0.02531695
0.0240128 0.02486944 0.02457714 0.02573967]
mean value: 0.02619004249572754
key: score_time
value: [0.01240253 0.00964069 0.00914097 0.00925875 0.00932145 0.00931454
0.00973344 0.00957346 0.00942445 0.00928974]
mean value: 0.009710001945495605
key: test_mcc
value: [0.63819901 0.67612782 0.63333333 0.63819901 0.67333333 0.55091896
0.7202771 0.68353656 0.75261781 0.9591663 ]
mean value: 0.6925709246850608
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.81632653 0.83673469 0.81632653 0.81632653 0.83673469 0.7755102
0.85714286 0.83673469 0.875 0.97916667]
mean value: 0.8446003401360545
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.82352941 0.84 0.81632653 0.82352941 0.84 0.78431373
0.85106383 0.82608696 0.86956522 0.9787234 ]
mean value: 0.8453138487587449
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.77777778 0.80769231 0.8 0.77777778 0.84 0.76923077
0.90909091 0.9047619 0.90909091 1. ]
mean value: 0.8495422355422355
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.875 0.83333333 0.875 0.84 0.8
0.8 0.76 0.83333333 0.95833333]
mean value: 0.845
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8175 0.8375 0.81666667 0.8175 0.83666667 0.775
0.85833333 0.83833333 0.875 0.97916667]
mean value: 0.8451666666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.7 0.72413793 0.68965517 0.7 0.72413793 0.64516129
0.74074074 0.7037037 0.76923077 0.95833333]
mean value: 0.7355100871813887
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.73
Accuracy on Blind test: 0.89
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.13182998 0.12682986 0.12640691 0.12741017 0.12682033 0.12823749
0.12678123 0.12986279 0.12703943 0.12754059]
mean value: 0.1278758764266968
key: score_time
value: [0.01910758 0.01841569 0.01830077 0.01848006 0.01812434 0.01878262
0.01830769 0.01837826 0.01823258 0.01816678]
mean value: 0.01842963695526123
key: test_mcc
value: [0.755 0.60104076 0.51252158 0.5943247 0.43604918 0.63333333
0.51089422 0.5943247 0.75261781 0.6761234 ]
mean value: 0.6066229699611482
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.87755102 0.79591837 0.75510204 0.79591837 0.71428571 0.81632653
0.75510204 0.79591837 0.875 0.83333333]
mean value: 0.8014455782312925
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.875 0.80769231 0.76 0.8 0.69565217 0.81632653
0.76923077 0.79166667 0.88 0.84615385]
mean value: 0.8041722294268878
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.875 0.75 0.73076923 0.76923077 0.76190476 0.83333333
0.74074074 0.82608696 0.84615385 0.78571429]
mean value: 0.7918933924368707
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.875 0.875 0.79166667 0.83333333 0.64 0.8
0.8 0.76 0.91666667 0.91666667]
mean value: 0.8208333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.8775 0.7975 0.75583333 0.79666667 0.71583333 0.81666667
0.75416667 0.79666667 0.875 0.83333333]
mean value: 0.8019166666666666
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.77777778 0.67741935 0.61290323 0.66666667 0.53333333 0.68965517
0.625 0.65517241 0.78571429 0.73333333]
mean value: 0.6756975563677454
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.67
Accuracy on Blind test: 0.86
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01058841 0.01049209 0.01040554 0.01045561 0.01053834 0.01071095
0.01147485 0.01093698 0.01081371 0.01044965]
mean value: 0.010686612129211426
key: score_time
value: [0.00906849 0.00916314 0.00917268 0.0091629 0.00955749 0.00915098
0.00915647 0.00910068 0.00912881 0.00956321]
mean value: 0.00922248363494873
key: test_mcc
value: [0.27701416 0.34673805 0.34666667 0.31529953 0.22780857 0.05892557
0.06166667 0.43071846 0.39204616 0.3380617 ]
mean value: 0.2794945523300344
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.63265306 0.67346939 0.67346939 0.65306122 0.6122449 0.53061224
0.53061224 0.71428571 0.6875 0.66666667]
mean value: 0.6374574829931973
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.66666667 0.65217391 0.66666667 0.67924528 0.59574468 0.56603774
0.53061224 0.70833333 0.72727273 0.69230769]
mean value: 0.6485060943907512
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.6 0.68181818 0.66666667 0.62068966 0.63636364 0.53571429
0.54166667 0.73913043 0.64516129 0.64285714]
mean value: 0.6310067960364183
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.75 0.625 0.66666667 0.75 0.56 0.6
0.52 0.68 0.83333333 0.75 ]
mean value: 0.6735
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.635 0.6725 0.67333333 0.655 0.61333333 0.52916667
0.53083333 0.715 0.6875 0.66666667]
mean value: 0.6378333333333334
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.5 0.48387097 0.5 0.51428571 0.42424242 0.39473684
0.36111111 0.5483871 0.57142857 0.52941176]
mean value: 0.4827474492395096
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.4
Accuracy on Blind test: 0.74
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [1.87583995 1.81410217 1.99092579 2.11582375 2.0683229 2.05223727
2.08664584 2.0749619 2.19409323 2.02771664]
mean value: 2.0300669431686402
key: score_time
value: [0.09192443 0.09313083 0.09533286 0.10319018 0.09989047 0.20439339
0.092803 0.10103011 0.12323284 0.10469651]
mean value: 0.11096246242523193
key: test_mcc
value: [0.80235519 0.88443328 0.715 0.76603235 0.63333333 0.755
0.7145252 0.7202771 0.70894901 0.91666667]
mean value: 0.7616572124350064
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.89795918 0.93877551 0.85714286 0.87755102 0.81632653 0.87755102
0.85714286 0.85714286 0.85416667 0.95833333]
mean value: 0.8792091836734693
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.90196078 0.94117647 0.85714286 0.88461538 0.81632653 0.88
0.8627451 0.85106383 0.85714286 0.95833333]
mean value: 0.8810507145575088
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85185185 0.88888889 0.84 0.82142857 0.83333333 0.88
0.84615385 0.90909091 0.84 0.95833333]
mean value: 0.8669080734080734
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95833333 1. 0.875 0.95833333 0.8 0.88
0.88 0.8 0.875 0.95833333]
mean value: 0.8985000000000001
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.89916667 0.94 0.8575 0.87916667 0.81666667 0.8775
0.85666667 0.85833333 0.85416667 0.95833333]
mean value: 0.87975
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.82142857 0.88888889 0.75 0.79310345 0.68965517 0.78571429
0.75862069 0.74074074 0.75 0.92 ]
mean value: 0.7898151797117314
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.28804898 1.22133827 1.22952151 1.21814561 1.28684974 1.25389266
1.22434378 1.77762294 2.32905412 0.99304485]
mean value: 1.3821862459182739
key: score_time
value: [0.14605403 0.16223454 0.17510486 0.18274355 0.15737247 0.18217468
0.25552583 0.20922709 0.21482563 0.25939536]
mean value: 0.1944658041000366
key: test_mcc
value: [0.83973406 0.88443328 0.67333333 0.83973406 0.63333333 0.755
0.6750504 0.755 0.70894901 0.87576054]
mean value: 0.7640328006862502
key: train_mcc
value: [0.89566388 0.90948015 0.9227863 0.89566388 0.9230259 0.92333442
0.9230259 0.90949509 0.90942919 0.91387241]
mean value: 0.9125777123466576
key: test_accuracy
value: [0.91836735 0.93877551 0.83673469 0.91836735 0.81632653 0.87755102
0.83673469 0.87755102 0.85416667 0.9375 ]
mean value: 0.8812074829931973
key: train_accuracy
value: [0.9476082 0.95444191 0.96127563 0.9476082 0.96127563 0.96127563
0.96127563 0.95444191 0.95454545 0.95681818]
mean value: 0.9560566369848831
key: test_fscore
value: [0.92 0.94117647 0.83333333 0.92 0.81632653 0.88
0.84615385 0.88 0.85714286 0.93877551]
mean value: 0.8832908548034598
key: train_fscore
value: [0.94854586 0.95535714 0.96179775 0.94854586 0.96179775 0.96196868
0.96179775 0.95515695 0.95515695 0.95730337]
mean value: 0.9567428076100482
key: test_precision
value: [0.88461538 0.88888889 0.83333333 0.88461538 0.83333333 0.88
0.81481481 0.88 0.84 0.92 ]
mean value: 0.865960113960114
key: train_precision
value: [0.9339207 0.93859649 0.95111111 0.9339207 0.94690265 0.94298246
0.94690265 0.93832599 0.94247788 0.94666667]
mean value: 0.9421807311867965
key: test_recall
value: [0.95833333 1. 0.83333333 0.95833333 0.8 0.88
0.88 0.88 0.875 0.95833333]
mean value: 0.9023333333333333
key: train_recall
value: [0.96363636 0.97272727 0.97272727 0.96363636 0.97716895 0.98173516
0.97716895 0.97260274 0.96818182 0.96818182]
mean value: 0.9717766708177666
key: test_roc_auc
value: [0.91916667 0.94 0.83666667 0.91916667 0.81666667 0.8775
0.83583333 0.8775 0.85416667 0.9375 ]
mean value: 0.8814166666666667
key: train_roc_auc
value: [0.94757161 0.95440017 0.96124948 0.94757161 0.96131175 0.96132213
0.96131175 0.95448319 0.95454545 0.95681818]
mean value: 0.9560585305105853
key: test_jcc
value: [0.85185185 0.88888889 0.71428571 0.85185185 0.68965517 0.78571429
0.73333333 0.78571429 0.75 0.88461538]
mean value: 0.793591076866939
key: train_jcc
value: [0.90212766 0.91452991 0.92640693 0.90212766 0.92640693 0.92672414
0.92640693 0.91416309 0.91416309 0.91810345]
mean value: 0.9171159779364038
MCC on Blind test: 0.79
Accuracy on Blind test: 0.91
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.023139 0.01131988 0.01124835 0.01410437 0.01195288 0.01177645
0.01184845 0.01120639 0.01069093 0.01091528]
mean value: 0.012820196151733399
key: score_time
value: [0.00903487 0.00906825 0.01109552 0.01174092 0.01075888 0.01009846
0.0100162 0.01012874 0.01001716 0.0095706 ]
mean value: 0.010152959823608398
key: test_mcc
value: [0.715 0.38833333 0.39196475 0.51252158 0.34666667 0.59839104
0.63333333 0.55166667 0.58536941 0.58333333]
mean value: 0.5306580112563036
key: train_mcc
value: [0.59080781 0.61303798 0.60465079 0.62803099 0.60397855 0.62220982
0.59975096 0.59009009 0.5917901 0.56832862]
mean value: 0.6012675717929086
key: test_accuracy
value: [0.85714286 0.69387755 0.69387755 0.75510204 0.67346939 0.79591837
0.81632653 0.7755102 0.79166667 0.79166667]
mean value: 0.7644557823129251
key: train_accuracy
value: [0.79498861 0.80637813 0.80182232 0.81321185 0.80182232 0.81093394
0.79954442 0.79498861 0.79545455 0.78409091]
mean value: 0.8003235659556844
key: test_fscore
value: [0.85714286 0.69387755 0.70588235 0.76 0.68 0.81481481
0.81632653 0.7755102 0.8 0.79166667]
mean value: 0.7695220977279801
key: train_fscore
value: [0.80088496 0.8098434 0.80794702 0.82017544 0.80449438 0.81348315
0.80357143 0.79638009 0.80088496 0.7816092 ]
mean value: 0.8039274012977246
key: test_precision
value: [0.84 0.68 0.66666667 0.73076923 0.68 0.75862069
0.83333333 0.79166667 0.76923077 0.79166667]
mean value: 0.7541954022988506
key: train_precision
value: [0.78017241 0.79735683 0.78540773 0.79237288 0.7920354 0.80088496
0.7860262 0.78923767 0.78017241 0.79069767]
mean value: 0.7894364159893563
key: test_recall
value: [0.875 0.70833333 0.75 0.79166667 0.68 0.88
0.8 0.76 0.83333333 0.79166667]
mean value: 0.787
key: train_recall
value: [0.82272727 0.82272727 0.83181818 0.85 0.8173516 0.82648402
0.82191781 0.80365297 0.82272727 0.77272727]
mean value: 0.8192133665421336
key: test_roc_auc
value: [0.8575 0.69416667 0.695 0.75583333 0.67333333 0.79416667
0.81666667 0.77583333 0.79166667 0.79166667]
mean value: 0.7645833333333334
key: train_roc_auc
value: [0.79492528 0.80634081 0.80175384 0.81312785 0.80185762 0.81096928
0.79959527 0.7950083 0.79545455 0.78409091]
mean value: 0.8003123702781237
key: test_jcc
value: [0.75 0.53125 0.54545455 0.61290323 0.51515152 0.6875
0.68965517 0.63333333 0.66666667 0.65517241]
mean value: 0.6287086872619408
key: train_jcc
value: [0.66789668 0.68045113 0.67777778 0.69516729 0.67293233 0.68560606
0.67164179 0.66165414 0.66789668 0.64150943]
mean value: 0.6722533301554774
MCC on Blind test: 0.61
Accuracy on Blind test: 0.84
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [0.24321389 0.33406544 0.34369111 0.25171113 0.23831272 0.22113085
0.21617317 0.21385956 0.24165797 7.08838511]
mean value: 0.9392200946807862
key: score_time
value: [0.01211691 0.01195264 0.01236773 0.0118885 0.01287055 0.0118053
0.01144505 0.01131988 0.01220369 0.01409149]
mean value: 0.012206172943115235
key: test_mcc
value: [0.84852814 0.84852814 0.67333333 0.87833333 0.83920658 0.7145252
0.755 0.79632832 0.79235477 0.91986621]
mean value: 0.8066004026133584
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.91836735 0.91836735 0.83673469 0.93877551 0.91836735 0.85714286
0.87755102 0.89795918 0.89583333 0.95833333]
mean value: 0.9017431972789116
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.92307692 0.92307692 0.83333333 0.93877551 0.92307692 0.8627451
0.88 0.90196078 0.89795918 0.95652174]
mean value: 0.904052641792503
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.85714286 0.85714286 0.83333333 0.92 0.88888889 0.84615385
0.88 0.88461538 0.88 1. ]
mean value: 0.8847277167277168
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [1. 1. 0.83333333 0.95833333 0.96 0.88
0.88 0.92 0.91666667 0.91666667]
mean value: 0.9265
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.92 0.92 0.83666667 0.93916667 0.9175 0.85666667
0.8775 0.8975 0.89583333 0.95833333]
mean value: 0.9019166666666667
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.85714286 0.85714286 0.71428571 0.88461538 0.85714286 0.75862069
0.78571429 0.82142857 0.81481481 0.91666667]
mean value: 0.8267574698609181
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.83
Accuracy on Blind test: 0.93
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.14163303 0.13927984 0.1110971 0.13177848 0.04264712 0.04237604
0.12133455 0.12579513 0.10211635 0.13984275]
mean value: 0.1097900390625
key: score_time
value: [0.02664208 0.03177047 0.0274868 0.01261592 0.01301384 0.01302195
0.02215147 0.02167702 0.04612851 0.03137469]
mean value: 0.02458827495574951
key: test_mcc
value: [0.55091896 0.51 0.36080239 0.67612782 0.43604918 0.5943247
0.46911585 0.59166667 0.45873171 0.71393289]
mean value: 0.5361670172034785
key: train_mcc
value: [0.8047578 0.80035377 0.8409621 0.79158975 0.83771907 0.79968448
0.79601091 0.81329265 0.80119274 0.80082773]
mean value: 0.808639100168264
key: test_accuracy
value: [0.7755102 0.75510204 0.67346939 0.83673469 0.71428571 0.79591837
0.73469388 0.79591837 0.72916667 0.85416667]
mean value: 0.7664965986394557
key: train_accuracy
value: [0.90205011 0.89977221 0.92027335 0.8952164 0.91799544 0.89977221
0.89749431 0.90660592 0.9 0.9 ]
mean value: 0.9039179954441914
key: test_fscore
value: [0.76595745 0.75 0.7037037 0.84 0.69565217 0.79166667
0.74509804 0.8 0.73469388 0.8627451 ]
mean value: 0.7689517005897848
key: train_fscore
value: [0.90423163 0.90222222 0.92170022 0.89823009 0.92035398 0.90045249
0.89977728 0.90702948 0.90265487 0.90222222]
mean value: 0.9058874482042989
key: test_precision
value: [0.7826087 0.75 0.63333333 0.80769231 0.76190476 0.82608696
0.73076923 0.8 0.72 0.81481481]
mean value: 0.7627210100688362
key: train_precision
value: [0.88646288 0.8826087 0.90748899 0.875 0.89270386 0.89237668
0.87826087 0.9009009 0.87931034 0.8826087 ]
mean value: 0.8877721919753557
key: test_recall
value: [0.75 0.75 0.79166667 0.875 0.64 0.76
0.76 0.8 0.75 0.91666667]
mean value: 0.7793333333333333
key: train_recall
value: [0.92272727 0.92272727 0.93636364 0.92272727 0.94977169 0.9086758
0.92237443 0.91324201 0.92727273 0.92272727]
mean value: 0.9248609381486094
key: test_roc_auc
value: [0.775 0.755 0.67583333 0.8375 0.71583333 0.79666667
0.73416667 0.79583333 0.72916667 0.85416667]
mean value: 0.7669166666666667
key: train_roc_auc
value: [0.90200291 0.8997198 0.92023661 0.89515359 0.91806766 0.89979244
0.89755085 0.906621 0.9 0.9 ]
mean value: 0.9039144873391449
key: test_jcc
value: [0.62068966 0.6 0.54285714 0.72413793 0.53333333 0.65517241
0.59375 0.66666667 0.58064516 0.75862069]
mean value: 0.6275872993802638
key: train_jcc
value: [0.82520325 0.82186235 0.85477178 0.81526104 0.85245902 0.81893004
0.81781377 0.82987552 0.82258065 0.82186235]
mean value: 0.8280619763359249
MCC on Blind test: 0.64
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01937389 0.05064607 0.05823445 0.03910899 0.03520322 0.01185274
0.011024 0.01107717 0.01257539 0.0136404 ]
mean value: 0.026273632049560548
key: score_time
value: [0.02086258 0.0563364 0.03695273 0.0254395 0.01412416 0.00974655
0.00964689 0.01003909 0.01183605 0.01182985]
mean value: 0.020681381225585938
key: test_mcc
value: [0.6750504 0.47 0.47 0.46911585 0.43071846 0.42833333
0.43071846 0.59166667 0.58536941 0.5 ]
mean value: 0.5050972578617717
key: train_mcc
value: [0.51263368 0.54898298 0.56723544 0.54913848 0.55809465 0.5626401
0.57175176 0.53074991 0.54095939 0.54091468]
mean value: 0.5483101071639697
key: test_accuracy
value: [0.83673469 0.73469388 0.73469388 0.73469388 0.71428571 0.71428571
0.71428571 0.79591837 0.79166667 0.75 ]
mean value: 0.752125850340136
key: train_accuracy
value: [0.75626424 0.77448747 0.78359909 0.77448747 0.77904328 0.78132118
0.78587699 0.76537585 0.77045455 0.77045455]
mean value: 0.7741364671774694
key: test_fscore
value: [0.82608696 0.73469388 0.73469388 0.72340426 0.70833333 0.72
0.70833333 0.8 0.8 0.75 ]
mean value: 0.7505545633609596
key: train_fscore
value: [0.75955056 0.77448747 0.78555305 0.77241379 0.77904328 0.78082192
0.78538813 0.76430206 0.77200903 0.77097506]
mean value: 0.7744544345207075
key: test_precision
value: [0.86363636 0.72 0.72 0.73913043 0.73913043 0.72
0.73913043 0.8 0.76923077 0.75 ]
mean value: 0.7560258437214958
key: train_precision
value: [0.75111111 0.77625571 0.78026906 0.78139535 0.77727273 0.78082192
0.78538813 0.76605505 0.76681614 0.76923077]
mean value: 0.7734615957541756
key: test_recall
value: [0.79166667 0.75 0.75 0.70833333 0.68 0.72
0.68 0.8 0.83333333 0.75 ]
mean value: 0.7463333333333333
key: train_recall
value: [0.76818182 0.77272727 0.79090909 0.76363636 0.78082192 0.78082192
0.78538813 0.76255708 0.77727273 0.77272727]
mean value: 0.7755043586550436
key: test_roc_auc
value: [0.83583333 0.735 0.735 0.73416667 0.715 0.71416667
0.715 0.79583333 0.79166667 0.75 ]
mean value: 0.7521666666666667
key: train_roc_auc
value: [0.75623703 0.77449149 0.7835824 0.77451225 0.77904732 0.78132005
0.78587588 0.76536945 0.77045455 0.77045455]
mean value: 0.774134495641345
key: test_jcc
value: [0.7037037 0.58064516 0.58064516 0.56666667 0.5483871 0.5625
0.5483871 0.66666667 0.66666667 0.6 ]
mean value: 0.6024268219832736
key: train_jcc
value: [0.61231884 0.63197026 0.64684015 0.62921348 0.6380597 0.64044944
0.64661654 0.61851852 0.62867647 0.62730627]
mean value: 0.6319969675865363
MCC on Blind test: 0.55
Accuracy on Blind test: 0.82
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.01522183 0.04666972 0.06443954 0.04834628 0.04992676 0.06265259
0.05166793 0.04743147 0.0382247 0.01697612]
mean value: 0.04415569305419922
key: score_time
value: [0.01111174 0.019274 0.01320052 0.01206517 0.01248288 0.01242256
0.01222825 0.01206899 0.01885581 0.0120883 ]
mean value: 0.013579821586608887
key: test_mcc
value: [0.67612782 0.61216708 0.3492027 0.7202771 0.47140452 0.71889189
0.39836231 0.71889189 0.57735027 0.30151134]
mean value: 0.5544186940394943
key: train_mcc
value: [0.64178459 0.73768986 0.77430537 0.70066939 0.74181793 0.72352791
0.68212557 0.66949315 0.57783716 0.24731124]
mean value: 0.6496562173213511
key: test_accuracy
value: [0.83673469 0.79591837 0.67346939 0.85714286 0.73469388 0.85714286
0.69387755 0.85714286 0.75 0.58333333]
mean value: 0.7639455782312925
key: train_accuracy
value: [0.81321185 0.86332574 0.88610478 0.84510251 0.86332574 0.85876993
0.83143508 0.83371298 0.76363636 0.56136364]
mean value: 0.811998861047836
key: test_fscore
value: [0.84 0.81481481 0.63636364 0.8627451 0.75471698 0.86792453
0.66666667 0.86792453 0.8 0.70588235]
mean value: 0.781703860656136
key: train_fscore
value: [0.83196721 0.87447699 0.88207547 0.85774059 0.87551867 0.86695279
0.80829016 0.83956044 0.80377358 0.69413629]
mean value: 0.8334492191440513
key: test_precision
value: [0.80769231 0.73333333 0.7 0.81481481 0.71428571 0.82142857
0.75 0.82142857 0.66666667 0.54545455]
mean value: 0.7375104525104524
key: train_precision
value: [0.75746269 0.81007752 0.91666667 0.79457364 0.80228137 0.81781377
0.93413174 0.80932203 0.68709677 0.53284672]
mean value: 0.7862272909975274
key: test_recall
value: [0.875 0.91666667 0.58333333 0.91666667 0.8 0.92
0.6 0.92 1. 1. ]
mean value: 0.8531666666666666
key: train_recall
value: [0.92272727 0.95 0.85 0.93181818 0.96347032 0.92237443
0.71232877 0.87214612 0.96818182 0.99545455]
mean value: 0.9088501452885014
key: test_roc_auc
value: [0.8375 0.79833333 0.67166667 0.85833333 0.73333333 0.85583333
0.69583333 0.85583333 0.75 0.58333333]
mean value: 0.7639999999999999
key: train_roc_auc
value: [0.81296181 0.86312785 0.88618721 0.84490452 0.86355334 0.85891449
0.83116438 0.83380033 0.76363636 0.56136364]
mean value: 0.8119613947696139
key: test_jcc
value: [0.72413793 0.6875 0.46666667 0.75862069 0.60606061 0.76666667
0.5 0.76666667 0.66666667 0.54545455]
mean value: 0.6488440438871473
key: train_jcc
value: [0.7122807 0.77695167 0.78902954 0.75091575 0.77859779 0.76515152
0.67826087 0.72348485 0.67192429 0.5315534 ]
mean value: 0.7178150368856082
MCC on Blind test: 0.54
Accuracy on Blind test: 0.83
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.02800584 0.04849577 0.05744505 0.06142521 0.03895092 0.02554607
0.02476668 0.02866244 0.02244735 0.02189159]
mean value: 0.03576369285583496
key: score_time
value: [0.01322865 0.01706743 0.02451825 0.02228951 0.01022005 0.01208138
0.01217604 0.01215816 0.01213026 0.01219058]
mean value: 0.014806032180786133
key: test_mcc
value: [0.69302938 0.63333333 0.3932917 0.68145382 0.21189236 0.48412292
0.47404284 0.59166667 0.58536941 0.71393289]
mean value: 0.5462135324503263
key: train_mcc
value: [0.51702284 0.76028119 0.74181793 0.71419653 0.53065219 0.45848334
0.77681141 0.74500769 0.71877954 0.72874979]
mean value: 0.6691802430417745
key: test_accuracy
value: [0.83673469 0.81632653 0.69387755 0.83673469 0.57142857 0.69387755
0.73469388 0.79591837 0.79166667 0.85416667]
mean value: 0.7625425170068028
key: train_accuracy
value: [0.72209567 0.87927107 0.86332574 0.85193622 0.73120729 0.67881549
0.88610478 0.87243736 0.85454545 0.86363636]
mean value: 0.8203375440049699
key: test_fscore
value: [0.80952381 0.81632653 0.65116279 0.81818182 0.36363636 0.76923077
0.72340426 0.8 0.7826087 0.8627451 ]
mean value: 0.7396820130893219
key: train_fscore
value: [0.62804878 0.88351648 0.84848485 0.83870968 0.64242424 0.75478261
0.87922705 0.87330317 0.84158416 0.86784141]
mean value: 0.8057922429696769
key: test_precision
value: [0.94444444 0.8 0.73684211 0.9 0.75 0.625
0.77272727 0.8 0.81818182 0.81481481]
mean value: 0.7962010455431509
key: train_precision
value: [0.9537037 0.85531915 0.95454545 0.92349727 0.95495495 0.60955056
0.93333333 0.86547085 0.92391304 0.84188034]
mean value: 0.8816168662407472
key: test_recall
value: [0.70833333 0.83333333 0.58333333 0.75 0.24 1.
0.68 0.8 0.75 0.91666667]
mean value: 0.7261666666666666
key: train_recall
value: [0.46818182 0.91363636 0.76363636 0.76818182 0.48401826 0.99086758
0.83105023 0.88127854 0.77272727 0.89545455]
mean value: 0.7769032793690328
key: test_roc_auc
value: [0.83416667 0.81666667 0.69166667 0.835 0.57833333 0.6875
0.73583333 0.79583333 0.79166667 0.85416667]
mean value: 0.7620833333333333
key: train_roc_auc
value: [0.72267538 0.87919261 0.86355334 0.85212744 0.7306455 0.6795247
0.88597966 0.87245745 0.85454545 0.86363636]
mean value: 0.8204337899543379
key: test_jcc
value: [0.68 0.68965517 0.48275862 0.69230769 0.22222222 0.625
0.56666667 0.66666667 0.64285714 0.75862069]
mean value: 0.6026754873479011
key: train_jcc
value: [0.45777778 0.79133858 0.73684211 0.72222222 0.47321429 0.60614525
0.78448276 0.7751004 0.72649573 0.76653696]
mean value: 0.6840156076754643
MCC on Blind test: 0.63
Accuracy on Blind test: 0.83
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.19333434 0.1770587 0.17805886 0.17734861 0.17584944 0.17565274
0.17738605 0.1771369 0.17707872 0.18207049]
mean value: 0.17909748554229737
key: score_time
value: [0.0158093 0.01619959 0.01555848 0.01538253 0.01540327 0.01564074
0.01553726 0.01590323 0.01605821 0.01594472]
mean value: 0.015743732452392578
key: test_mcc
value: [0.88443328 0.75793094 0.63272208 0.87833333 0.68353656 0.715
0.755 0.7145252 0.79235477 0.87576054]
mean value: 0.7689596704149929
key: train_mcc
value: [0.95036567 0.97747332 0.96819862 0.96371187 0.9682006 0.96836207
0.95480125 0.9635941 0.96827185 0.95470327]
mean value: 0.9637682614354717
key: test_accuracy
value: [0.93877551 0.87755102 0.81632653 0.93877551 0.83673469 0.85714286
0.87755102 0.85714286 0.89583333 0.9375 ]
mean value: 0.8833333333333333
key: train_accuracy
value: [0.97494305 0.98861048 0.98405467 0.98177677 0.98405467 0.98405467
0.97722096 0.98177677 0.98409091 0.97727273]
mean value: 0.9817855663698488
key: test_fscore
value: [0.94117647 0.88 0.80851064 0.93877551 0.82608696 0.85714286
0.88 0.8627451 0.89795918 0.93877551]
mean value: 0.8831172224671552
key: train_fscore
value: [0.9753915 0.98876404 0.98419865 0.98198198 0.98412698 0.98419865
0.97747748 0.98181818 0.98419865 0.97747748]
mean value: 0.9819633583501938
key: test_precision
value: [0.88888889 0.84615385 0.82608696 0.92 0.9047619 0.875
0.88 0.84615385 0.88 0.92 ]
mean value: 0.8787045442480225
key: train_precision
value: [0.96035242 0.97777778 0.97757848 0.97321429 0.97747748 0.97321429
0.96444444 0.97737557 0.97757848 0.96875 ]
mean value: 0.9727763210319266
key: test_recall
value: [1. 0.91666667 0.79166667 0.95833333 0.76 0.84
0.88 0.88 0.91666667 0.95833333]
mean value: 0.8901666666666667
key: train_recall
value: [0.99090909 1. 0.99090909 0.99090909 0.99086758 0.99543379
0.99086758 0.98630137 0.99090909 0.98636364]
mean value: 0.9913470319634703
key: test_roc_auc
value: [0.94 0.87833333 0.81583333 0.93916667 0.83833333 0.8575
0.8775 0.85666667 0.89583333 0.9375 ]
mean value: 0.8836666666666667
key: train_roc_auc
value: [0.9749066 0.98858447 0.98403902 0.98175592 0.98407015 0.98408053
0.97725197 0.98178705 0.98409091 0.97727273]
mean value: 0.9817839352428394
key: test_jcc
value: [0.88888889 0.78571429 0.67857143 0.88461538 0.7037037 0.75
0.78571429 0.75862069 0.81481481 0.88461538]
mean value: 0.7935258866293349
key: train_jcc
value: [0.95196507 0.97777778 0.96888889 0.96460177 0.96875 0.96888889
0.95594714 0.96428571 0.96888889 0.95594714]
mean value: 0.9645941267271599
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.06887841 0.06254005 0.09455132 0.07779312 0.06916332 0.08109474
0.09326601 0.07408905 0.09191942 0.07600975]
mean value: 0.07893052101135253
key: score_time
value: [0.02582598 0.0283165 0.02938533 0.03985167 0.02802324 0.02642775
0.03644276 0.03282738 0.02019882 0.02627158]
mean value: 0.029357099533081056
key: test_mcc
value: [0.8136762 0.75793094 0.67333333 0.80235519 0.67612782 0.55612092
0.755 0.7145252 0.83624201 0.9591663 ]
mean value: 0.7544477928372326
key: train_mcc
value: [0.99545455 0.97723074 0.96811889 0.99088834 0.98642422 0.96819862
0.98177667 0.98177667 0.9773636 0.98181818]
mean value: 0.9809050481225259
key: test_accuracy
value: [0.89795918 0.87755102 0.83673469 0.89795918 0.83673469 0.7755102
0.87755102 0.85714286 0.91666667 0.97916667]
mean value: 0.875297619047619
key: train_accuracy
value: [0.9977221 0.98861048 0.98405467 0.99544419 0.99316629 0.98405467
0.99088838 0.99088838 0.98863636 0.99090909]
mean value: 0.9904374611720853
key: test_fscore
value: [0.90566038 0.88 0.83333333 0.90196078 0.83333333 0.76595745
0.88 0.8627451 0.92 0.9787234 ]
mean value: 0.8761713777441928
key: train_fscore
value: [0.9977221 0.98866213 0.98412698 0.99545455 0.99310345 0.98390805
0.99086758 0.99086758 0.98871332 0.99090909]
mean value: 0.9904334820036527
key: test_precision
value: [0.82758621 0.84615385 0.83333333 0.85185185 0.86956522 0.81818182
0.88 0.84615385 0.88461538 1. ]
mean value: 0.8657441504577936
key: train_precision
value: [1. 0.98642534 0.98190045 0.99545455 1. 0.99074074
0.99086758 0.99086758 0.98206278 0.99090909]
mean value: 0.9909228109045991
key: test_recall
value: [1. 0.91666667 0.83333333 0.95833333 0.8 0.72
0.88 0.88 0.95833333 0.95833333]
mean value: 0.8905000000000001
key: train_recall
value: [0.99545455 0.99090909 0.98636364 0.99545455 0.98630137 0.97716895
0.99086758 0.99086758 0.99545455 0.99090909]
mean value: 0.989975093399751
key: test_roc_auc
value: [0.9 0.87833333 0.83666667 0.89916667 0.8375 0.77666667
0.8775 0.85666667 0.91666667 0.97916667]
mean value: 0.8758333333333334
key: train_roc_auc
value: [0.99772727 0.98860523 0.9840494 0.99544417 0.99315068 0.98403902
0.99088834 0.99088834 0.98863636 0.99090909]
mean value: 0.990433789954338
key: test_jcc
value: [0.82758621 0.78571429 0.71428571 0.82142857 0.71428571 0.62068966
0.78571429 0.75862069 0.85185185 0.95833333]
mean value: 0.7838510308337895
key: train_jcc
value: [0.99545455 0.97757848 0.96875 0.99095023 0.98630137 0.96832579
0.98190045 0.98190045 0.97767857 0.98198198]
mean value: 0.9810821867141358
MCC on Blind test: 0.77
Accuracy on Blind test: 0.9
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.16271353 0.23615193 0.19280815 0.20062351 0.17626858 0.1758728
0.16275883 0.17226195 0.17328191 0.19703627]
mean value: 0.18497774600982667
key: score_time
value: [0.02630448 0.02443695 0.03032565 0.0239799 0.02425122 0.02434754
0.0243566 0.02626634 0.02392149 0.03055239]
mean value: 0.02587425708770752
key: test_mcc
value: [0.47 0.38890873 0.43071846 0.225 0.43071846 0.38890873
0.30666667 0.47 0.58333333 0.58536941]
mean value: 0.42796237929596553
key: train_mcc
value: [0.97731142 0.9863426 0.98177667 0.99092928 0.99545445 0.9863426
0.98634288 0.9818178 0.98185876 0.98637383]
mean value: 0.9854550288338422
key: test_accuracy
value: [0.73469388 0.69387755 0.71428571 0.6122449 0.71428571 0.69387755
0.65306122 0.73469388 0.79166667 0.79166667]
mean value: 0.7134353741496599
key: train_accuracy
value: [0.98861048 0.99316629 0.99088838 0.99544419 0.9977221 0.99316629
0.99316629 0.99088838 0.99090909 0.99318182]
mean value: 0.9927143300890453
key: test_fscore
value: [0.73469388 0.66666667 0.72 0.6122449 0.70833333 0.71698113
0.65306122 0.73469388 0.79166667 0.7826087 ]
mean value: 0.7120950371945333
key: train_fscore
value: [0.98871332 0.99319728 0.99090909 0.99547511 0.99771167 0.99313501
0.99316629 0.99090909 0.99095023 0.99319728]
mean value: 0.9927364366230393
key: test_precision
value: [0.72 0.71428571 0.69230769 0.6 0.73913043 0.67857143
0.66666667 0.75 0.79166667 0.81818182]
mean value: 0.7170810421462596
key: train_precision
value: [0.98206278 0.99095023 0.99090909 0.99099099 1. 0.99541284
0.99090909 0.98642534 0.98648649 0.99095023]
mean value: 0.9905097075456619
key: test_recall
value: [0.75 0.625 0.75 0.625 0.68 0.76
0.64 0.72 0.79166667 0.75 ]
mean value: 0.7091666666666667
key: train_recall
value: [0.99545455 0.99545455 0.99090909 1. 0.99543379 0.99086758
0.99543379 0.99543379 0.99545455 0.99545455]
mean value: 0.9949896222498962
key: test_roc_auc
value: [0.735 0.6925 0.715 0.6125 0.715 0.6925
0.65333333 0.735 0.79166667 0.79166667]
mean value: 0.7134166666666667
key: train_roc_auc
value: [0.98859485 0.99316106 0.99088834 0.99543379 0.99771689 0.99316106
0.99317144 0.99089871 0.99090909 0.99318182]
mean value: 0.9927117061021171
key: test_jcc
value: [0.58064516 0.5 0.5625 0.44117647 0.5483871 0.55882353
0.48484848 0.58064516 0.65517241 0.64285714]
mean value: 0.555505546085357
key: train_jcc
value: [0.97767857 0.98648649 0.98198198 0.99099099 0.99543379 0.98636364
0.98642534 0.98198198 0.98206278 0.98648649]
mean value: 0.9855892045310047
MCC on Blind test: 0.49
Accuracy on Blind test: 0.79
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [0.71926641 0.71162152 0.71367073 0.71375799 0.71037388 0.7147007
0.70813131 0.71393132 0.70967293 0.70716023]
mean value: 0.7122287034988404
key: score_time
value: [0.01039338 0.01002169 0.0094986 0.00955582 0.01005411 0.00989556
0.00942087 0.00945616 0.00969458 0.00933266]
mean value: 0.009732341766357422
key: test_mcc
value: [0.8136762 0.80235519 0.67333333 0.92153718 0.83920658 0.6363961
0.755 0.79632832 0.75261781 1. ]
mean value: 0.7990450713750232
key: train_mcc
value: [0.99545445 1. 1. 0.99545445 1. 1.
1. 1. 1. 1. ]
mean value: 0.9990908902647659
key: test_accuracy
value: [0.89795918 0.89795918 0.83673469 0.95918367 0.91836735 0.81632653
0.87755102 0.89795918 0.875 1. ]
mean value: 0.897704081632653
key: train_accuracy
value: [0.9977221 1. 1. 0.9977221 1. 1. 1.
1. 1. 1. ]
mean value: 0.9995444191343964
key: test_fscore
value: [0.90566038 0.90196078 0.83333333 0.96 0.92307692 0.83018868
0.88 0.90196078 0.88 1. ]
mean value: 0.9016180881641481
key: train_fscore
value: [0.99773243 1. 1. 0.99773243 1. 1.
1. 1. 1. 1. ]
mean value: 0.999546485260771
key: test_precision
value: [0.82758621 0.85185185 0.83333333 0.92307692 0.88888889 0.78571429
0.88 0.88461538 0.84615385 1. ]
mean value: 0.8721220720531065
key: train_precision
value: [0.99547511 1. 1. 0.99547511 1. 1.
1. 1. 1. 1. ]
mean value: 0.9990950226244344
key: test_recall
value: [1. 0.95833333 0.83333333 1. 0.96 0.88
0.88 0.92 0.91666667 1. ]
mean value: 0.9348333333333333
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.9 0.89916667 0.83666667 0.96 0.9175 0.815
0.8775 0.8975 0.875 1. ]
mean value: 0.8978333333333334
key: train_roc_auc
value: [0.99771689 1. 1. 0.99771689 1. 1.
1. 1. 1. 1. ]
mean value: 0.9995433789954338
key: test_jcc
value: [0.82758621 0.82142857 0.71428571 0.92307692 0.85714286 0.70967742
0.78571429 0.82142857 0.78571429 1. ]
mean value: 0.8246054835042599
key: train_jcc
value: [0.99547511 1. 1. 0.99547511 1. 1.
1. 1. 1. 1. ]
mean value: 0.9990950226244344
MCC on Blind test: 0.81
Accuracy on Blind test: 0.92
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.03085756 0.03368568 0.03450561 0.03497577 0.03449321 0.05795598
0.05796933 0.05467415 0.05438256 0.0380888 ]
mean value: 0.043158864974975585
key: score_time
value: [0.01293373 0.01301718 0.03573799 0.01297569 0.01292992 0.01947093
0.01932955 0.0131321 0.01314855 0.01365209]
mean value: 0.016632771492004393
key: test_mcc
value: [-0.05990423 -0.01162476 0.05 0.10143783 0.27326049 0.41666667
0.35343496 0.22571524 0.19245009 0.35355339]
mean value: 0.1894989672081178
key: train_mcc
value: [0.41406034 0.4586092 0.49471053 0.5160817 0.48532893 0.65068532
0.78556244 0.5944114 0.48932261 0.51063195]
mean value: 0.5399404434769084
key: test_accuracy
value: [0.46938776 0.48979592 0.51020408 0.53061224 0.6122449 0.65306122
0.65306122 0.6122449 0.58333333 0.66666667]
mean value: 0.5780612244897959
key: train_accuracy
value: [0.64692483 0.67425968 0.69703872 0.71070615 0.69020501 0.79726651
0.88154897 0.76082005 0.69318182 0.70681818]
mean value: 0.725876993166287
key: test_fscore
value: [0.58064516 0.59016393 0.63636364 0.64615385 0.70769231 0.74626866
0.73015873 0.65454545 0.66666667 0.71428571]
mean value: 0.6672944108299326
key: train_fscore
value: [0.7394958 0.75471698 0.76788831 0.77601411 0.7630662 0.83111954
0.89387755 0.80662983 0.76521739 0.77328647]
mean value: 0.787131218670251
key: test_precision
value: [0.47368421 0.48648649 0.5 0.51219512 0.575 0.5952381
0.60526316 0.6 0.55555556 0.625 ]
mean value: 0.552842262765241
key: train_precision
value: [0.58666667 0.60606061 0.62322946 0.63400576 0.61690141 0.71103896
0.80811808 0.67592593 0.61971831 0.63037249]
mean value: 0.6512037677464642
key: test_recall
value: [0.75 0.75 0.875 0.875 0.92 1.
0.92 0.72 0.83333333 0.83333333]
mean value: 0.8476666666666667
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.475 0.495 0.5175 0.5375 0.60583333 0.64583333
0.6475 0.61 0.58333333 0.66666667]
mean value: 0.5784166666666667
key: train_roc_auc
value: [0.64611872 0.67351598 0.69634703 0.71004566 0.69090909 0.79772727
0.88181818 0.76136364 0.69318182 0.70681818]
mean value: 0.7257845579078456
key: test_jcc
value: [0.40909091 0.41860465 0.46666667 0.47727273 0.54761905 0.5952381
0.575 0.48648649 0.5 0.55555556]
mean value: 0.5031534139092279
key: train_jcc
value: [0.58666667 0.60606061 0.62322946 0.63400576 0.61690141 0.71103896
0.80811808 0.67592593 0.61971831 0.63037249]
mean value: 0.6512037677464642
MCC on Blind test: 0.16
Accuracy on Blind test: 0.43
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.02436137 0.03441906 0.03893018 0.03864837 0.0392921 0.03875279
0.0352807 0.03918028 0.03790474 0.03902316]
mean value: 0.03657927513122559
key: score_time
value: [0.01807237 0.02321362 0.01447272 0.02220654 0.02380419 0.02437162
0.02390718 0.0213306 0.01967859 0.02034736]
mean value: 0.021140480041503908
key: test_mcc
value: [0.755 0.51252158 0.47404284 0.715 0.51252158 0.71889189
0.46911585 0.7145252 0.58536941 0.70894901]
mean value: 0.6165937358298096
key: train_mcc
value: [0.76028119 0.74681841 0.80164338 0.74562127 0.79257426 0.75008329
0.76517092 0.76035543 0.77477899 0.72441305]
mean value: 0.7621740179645166
key: test_accuracy
value: [0.87755102 0.75510204 0.73469388 0.85714286 0.75510204 0.85714286
0.73469388 0.85714286 0.79166667 0.85416667]
mean value: 0.8074404761904762
key: train_accuracy
value: [0.87927107 0.87243736 0.89977221 0.87243736 0.8952164 0.87471526
0.88154897 0.87927107 0.88636364 0.86136364]
mean value: 0.880239697659971
key: test_fscore
value: [0.875 0.76 0.74509804 0.85714286 0.75 0.86792453
0.74509804 0.8627451 0.8 0.85714286]
mean value: 0.8120151419058189
key: train_fscore
value: [0.88351648 0.87719298 0.90350877 0.87555556 0.89867841 0.87695749
0.88546256 0.88300221 0.89035088 0.86593407]
mean value: 0.8840159407660725
key: test_precision
value: [0.875 0.73076923 0.7037037 0.84 0.7826087 0.82142857
0.73076923 0.84615385 0.76923077 0.84 ]
mean value: 0.7939664047707526
key: train_precision
value: [0.85531915 0.84745763 0.87288136 0.85652174 0.86808511 0.85964912
0.85531915 0.85470085 0.86016949 0.83829787]
mean value: 0.8568401467810323
key: test_recall
value: [0.875 0.79166667 0.79166667 0.875 0.72 0.92
0.76 0.88 0.83333333 0.875 ]
mean value: 0.8321666666666667
key: train_recall
value: [0.91363636 0.90909091 0.93636364 0.89545455 0.93150685 0.89497717
0.91780822 0.91324201 0.92272727 0.89545455]
mean value: 0.9130261519302615
key: test_roc_auc
value: [0.8775 0.75583333 0.73583333 0.8575 0.75583333 0.85583333
0.73416667 0.85666667 0.79166667 0.85416667]
mean value: 0.8074999999999999
key: train_roc_auc
value: [0.87919261 0.87235367 0.89968867 0.87238481 0.89529888 0.87476131
0.88163138 0.87934828 0.88636364 0.86136364]
mean value: 0.8802386882523869
key: test_jcc
value: [0.77777778 0.61290323 0.59375 0.75 0.6 0.76666667
0.59375 0.75862069 0.66666667 0.75 ]
mean value: 0.6870135026572735
key: train_jcc
value: [0.79133858 0.78125 0.824 0.77865613 0.816 0.78087649
0.7944664 0.79051383 0.80237154 0.76356589]
mean value: 0.7923038873312278
MCC on Blind test: 0.68
Accuracy on Blind test: 0.87
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.34428215 0.34850931 0.45896435 0.38414645 0.37303638 0.35414553
0.46823978 0.45267653 0.56025004 0.50578809]
mean value: 0.4250038623809814
key: score_time
value: [0.01888943 0.01431537 0.02232623 0.02402639 0.02232146 0.02438354
0.03082514 0.03807449 0.03736305 0.03020906]
mean value: 0.026273417472839355
key: test_mcc
value: [0.79666667 0.56448787 0.47404284 0.715 0.51252158 0.71889189
0.51089422 0.7145252 0.62994079 0.71393289]
mean value: 0.6350903953454979
key: train_mcc
value: [0.68873977 0.70769795 0.80164338 0.69247872 0.79257426 0.69743616
0.73038867 0.76035543 0.71907224 0.71266318]
mean value: 0.7303049743531872
key: test_accuracy
value: [0.89795918 0.7755102 0.73469388 0.85714286 0.75510204 0.85714286
0.75510204 0.85714286 0.8125 0.85416667]
mean value: 0.8156462585034013
key: train_accuracy
value: [0.8428246 0.85193622 0.89977221 0.84510251 0.8952164 0.84738041
0.86332574 0.87927107 0.85681818 0.85454545]
mean value: 0.8636192793539035
key: test_fscore
value: [0.89795918 0.79245283 0.74509804 0.85714286 0.75 0.86792453
0.76923077 0.8627451 0.82352941 0.8627451 ]
mean value: 0.8228827815596486
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:176: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:179: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[0.85032538 0.85961123 0.90350877 0.85152838 0.89867841 0.85339168
0.86956522 0.88300221 0.86509636 0.86147186]
mean value: 0.869617951203053
key: test_precision
value: [0.88 0.72413793 0.7037037 0.84 0.7826087 0.82142857
0.74074074 0.84615385 0.77777778 0.81481481]
mean value: 0.7931366081306112
key: train_precision
value: [0.81327801 0.81893004 0.87288136 0.81932773 0.86808511 0.81932773
0.82987552 0.85470085 0.81781377 0.82231405]
mean value: 0.8336534162093092
key: test_recall
value: [0.91666667 0.875 0.79166667 0.875 0.72 0.92
0.8 0.88 0.875 0.91666667]
mean value: 0.857
key: train_recall
value: [0.89090909 0.90454545 0.93636364 0.88636364 0.93150685 0.89041096
0.91324201 0.91324201 0.91818182 0.90454545]
mean value: 0.9089310917393109
key: test_roc_auc
value: [0.89833333 0.7775 0.73583333 0.8575 0.75583333 0.85583333
0.75416667 0.85666667 0.8125 0.85416667]
mean value: 0.8158333333333333
key: train_roc_auc
value: [0.84271482 0.85181611 0.89968867 0.8450083 0.89529888 0.84747821
0.86343919 0.87934828 0.85681818 0.85454545]
mean value: 0.8636156081361561
key: test_jcc
value: [0.81481481 0.65625 0.59375 0.75 0.6 0.76666667
0.625 0.75862069 0.7 0.75862069]
mean value: 0.7023722860791826
key: train_jcc
value: [0.73962264 0.75378788 0.824 0.74144487 0.816 0.74427481
0.76923077 0.79051383 0.76226415 0.75665399]
mean value: 0.7697792942939468
MCC on Blind test: 0.68
Accuracy on Blind test: 0.86
Model_name: Logistic Regression
Model func: LogisticRegression(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegression(random_state=42))])
key: fit_time
value: [0.13643479 0.17528868 0.20533156 0.28455257 0.26884556 0.20184398
0.19602919 0.17463088 0.19731784 0.15040684]
mean value: 0.19906818866729736
key: score_time
value: [0.02278376 0.02012134 0.04163885 0.03131413 0.02843094 0.01659536
0.03433967 0.02756572 0.02681756 0.0273664 ]
mean value: 0.027697372436523437
key: test_mcc
value: [0.65489945 0.65489945 0.72727273 0.79708114 0.77041694 0.72760688
0.74456392 0.67161876 0.74456392 0.74456392]
mean value: 0.7237487107624094
key: train_mcc
value: [0.75948704 0.78052753 0.76025862 0.77108311 0.75585691 0.76998793
0.76207246 0.76521684 0.75396519 0.7645048 ]
mean value: 0.7642960432414326
key: test_accuracy
value: [0.82706767 0.82706767 0.86363636 0.89393939 0.87878788 0.86363636
0.87121212 0.83333333 0.87121212 0.87121212]
mean value: 0.8601105035315562
key: train_accuracy
value: [0.87888982 0.88898234 0.8789916 0.88487395 0.87731092 0.88403361
0.88067227 0.88151261 0.87647059 0.88151261]
mean value: 0.8813250312740739
key: test_fscore
value: [0.82962963 0.82442748 0.86363636 0.90140845 0.88888889 0.86567164
0.87591241 0.84285714 0.87591241 0.87591241]
mean value: 0.8644256824700698
key: train_fscore
value: [0.88292683 0.89320388 0.88349515 0.88816327 0.88071895 0.88798701
0.88322368 0.88582996 0.87960688 0.88508557]
mean value: 0.885024118883971
key: test_precision
value: [0.8115942 0.84375 0.86363636 0.84210526 0.82051282 0.85294118
0.84507042 0.7972973 0.84507042 0.84507042]
mean value: 0.8367048391579148
key: train_precision
value: [0.85511811 0.85981308 0.85179407 0.86349206 0.85691574 0.85871272
0.8647343 0.8546875 0.85782748 0.85917722]
mean value: 0.8582272275472678
key: test_recall
value: [0.84848485 0.80597015 0.86363636 0.96969697 0.96969697 0.87878788
0.90909091 0.89393939 0.90909091 0.90909091]
mean value: 0.8957485300768883
key: train_recall
value: [0.91260504 0.92929293 0.91764706 0.91428571 0.90588235 0.91932773
0.90252101 0.91932773 0.90252101 0.91260504]
mean value: 0.9136015618368559
key: test_roc_auc
value: [0.8272275 0.8272275 0.86363636 0.89393939 0.87878788 0.86363636
0.87121212 0.83333333 0.87121212 0.87121212]
mean value: 0.8601424694708277
key: train_roc_auc
value: [0.87886144 0.88901621 0.8789916 0.88487395 0.87731092 0.88403361
0.88067227 0.88151261 0.87647059 0.88151261]
mean value: 0.8813255807373455
key: test_jcc
value: [0.70886076 0.7012987 0.76 0.82051282 0.8 0.76315789
0.77922078 0.72839506 0.77922078 0.77922078]
mean value: 0.7619887575432768
key: train_jcc
value: [0.79039301 0.80701754 0.79130435 0.79882526 0.78686131 0.79854015
0.79086892 0.79505814 0.78508772 0.79385965]
mean value: 0.7937816054460703
MCC on Blind test: 0.69
Accuracy on Blind test: 0.88
Model_name: Logistic RegressionCV
Model func: LogisticRegressionCV(random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LogisticRegressionCV(random_state=42))])
key: fit_time
value: [4.39447451 2.87602019 1.87464285 1.19950271 2.72753072 2.3022213
2.38066149 2.41723609 2.14946175 2.22808075]
mean value: 2.4549832344055176
key: score_time
value: [0.03131342 0.01659775 0.0143137 0.01979375 0.02363372 0.02368855
0.04590464 0.0226686 0.02403808 0.02302599]
mean value: 0.02449781894683838
key: test_mcc
value: [0.66915423 0.65616669 0.75897093 0.76642417 0.81060226 0.71285802
0.72760688 0.70214689 0.85004744 0.72861209]
mean value: 0.7382589583712581
key: train_mcc
value: [0.78730353 0.83890836 0.80886044 0.82694799 0.76869281 0.81700175
0.81049737 0.78488114 0.82362364 0.77390646]
mean value: 0.8040623492161448
key: test_accuracy
value: [0.83458647 0.82706767 0.87878788 0.87878788 0.90151515 0.85606061
0.86363636 0.84848485 0.92424242 0.86363636]
mean value: 0.867680565048986
key: train_accuracy
value: [0.89318755 0.91925988 0.90420168 0.91344538 0.88403361 0.90840336
0.90504202 0.89159664 0.91176471 0.88655462]
mean value: 0.9017489451625899
key: test_fscore
value: [0.83333333 0.82170543 0.875 0.88732394 0.90780142 0.85925926
0.86567164 0.85714286 0.92647059 0.86764706]
mean value: 0.8701355527043595
key: train_fscore
value: [0.89581624 0.92039801 0.90578512 0.91395155 0.88632619 0.90939318
0.90653433 0.89503662 0.91242702 0.88907149]
mean value: 0.9034739751181698
key: test_precision
value: [0.83333333 0.85483871 0.90322581 0.82894737 0.85333333 0.84057971
0.85294118 0.81081081 0.9 0.84285714]
mean value: 0.8520867391500221
key: train_precision
value: [0.875 0.90686275 0.89105691 0.90863787 0.86914378 0.89967105
0.89250814 0.86750789 0.90562914 0.86977492]
mean value: 0.8885792450788471
key: test_recall
value: [0.83333333 0.79104478 0.84848485 0.95454545 0.96969697 0.87878788
0.87878788 0.90909091 0.95454545 0.89393939]
mean value: 0.8912256897331524
key: train_recall
value: [0.91764706 0.93434343 0.9210084 0.91932773 0.90420168 0.91932773
0.9210084 0.92436975 0.91932773 0.9092437 ]
mean value: 0.9189805619217384
key: test_roc_auc
value: [0.83457711 0.82734057 0.87878788 0.87878788 0.90151515 0.85606061
0.86363636 0.84848485 0.92424242 0.86363636]
mean value: 0.8677069199457259
key: train_roc_auc
value: [0.89316696 0.91927256 0.90420168 0.91344538 0.88403361 0.90840336
0.90504202 0.89159664 0.91176471 0.88655462]
mean value: 0.9017481538069773
key: test_jcc
value: [0.71428571 0.69736842 0.77777778 0.79746835 0.83116883 0.75324675
0.76315789 0.75 0.8630137 0.76623377]
mean value: 0.7713721211562833
key: train_jcc
value: [0.81129272 0.85253456 0.82779456 0.84153846 0.79585799 0.83384146
0.8290469 0.81001473 0.83895706 0.80029586]
mean value: 0.8241174295814014
MCC on Blind test: 0.68
Accuracy on Blind test: 0.87
Model_name: Gaussian NB
Model func: GaussianNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianNB())])
key: fit_time
value: [0.03249121 0.03886437 0.01911259 0.01886559 0.01927328 0.01915526
0.01913929 0.01914072 0.01931453 0.0187757 ]
mean value: 0.022413253784179688
key: score_time
value: [0.03683639 0.03328943 0.01319766 0.01342869 0.01344657 0.01330543
0.0137465 0.01340461 0.01348186 0.01355147]
mean value: 0.01776885986328125
key: test_mcc
value: [0.39968644 0.44252459 0.3955774 0.41026992 0.46975089 0.50999094
0.51729353 0.42739376 0.5000574 0.55182541]
mean value: 0.46243702683016985
key: train_mcc
value: [0.49973658 0.51693179 0.48338956 0.49447762 0.48289321 0.48000922
0.48867275 0.50527161 0.48144571 0.48144571]
mean value: 0.491427375370929
key: test_accuracy
value: [0.69924812 0.71428571 0.6969697 0.70454545 0.73484848 0.75
0.75757576 0.71212121 0.75 0.77272727]
mean value: 0.7292321713374346
key: train_accuracy
value: [0.74852817 0.7569386 0.74033613 0.74621849 0.74033613 0.73865546
0.74369748 0.7512605 0.7394958 0.7394958 ]
mean value: 0.7444962577125047
key: test_fscore
value: [0.68253968 0.6779661 0.68253968 0.69291339 0.73684211 0.72268908
0.74603175 0.69354839 0.7518797 0.75409836]
mean value: 0.714104822652684
key: train_fscore
value: [0.73516386 0.74265361 0.72582076 0.73415493 0.72727273 0.72404614
0.73408893 0.73758865 0.72566372 0.72566372]
mean value: 0.7312117042117169
key: test_precision
value: [0.71666667 0.78431373 0.71666667 0.72131148 0.73134328 0.81132075
0.78333333 0.74137931 0.74626866 0.82142857]
mean value: 0.7574032444355586
key: train_precision
value: [0.77715356 0.78827977 0.76879699 0.77079482 0.76579926 0.76691729
0.76268116 0.7804878 0.76635514 0.76635514]
mean value: 0.7713620942500627
key: test_recall
value: [0.65151515 0.59701493 0.65151515 0.66666667 0.74242424 0.65151515
0.71212121 0.65151515 0.75757576 0.6969697 ]
mean value: 0.6778833107191315
key: train_recall
value: [0.69747899 0.7020202 0.68739496 0.70084034 0.69243697 0.68571429
0.70756303 0.69915966 0.68907563 0.68907563]
mean value: 0.6950759697818522
key: test_roc_auc
value: [0.6988919 0.71517413 0.6969697 0.70454545 0.73484848 0.75
0.75757576 0.71212121 0.75 0.77272727]
mean value: 0.7292853912256897
key: train_roc_auc
value: [0.74857115 0.75689245 0.74033613 0.74621849 0.74033613 0.73865546
0.74369748 0.7512605 0.7394958 0.7394958 ]
mean value: 0.7444959397900575
key: test_jcc
value: [0.51807229 0.51282051 0.51807229 0.53012048 0.58333333 0.56578947
0.59493671 0.5308642 0.60240964 0.60526316]
mean value: 0.5561682082919598
key: train_jcc
value: [0.58123249 0.59065156 0.56963788 0.57997218 0.57142857 0.5674548
0.57988981 0.58426966 0.56944444 0.56944444]
mean value: 0.5763425846399886
MCC on Blind test: 0.41
Accuracy on Blind test: 0.76
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.0336113 0.03431821 0.01922846 0.01908755 0.01887369 0.04641771
0.03898787 0.04656267 0.04592562 0.04554367]
mean value: 0.03485567569732666
key: score_time
value: [0.02585936 0.01357031 0.0133636 0.01322436 0.0131259 0.02207923
0.02071619 0.02628255 0.02265048 0.02270865]
mean value: 0.019358062744140626
key: test_mcc
value: [0.5339213 0.62602155 0.37987955 0.57602211 0.58003439 0.5768179
0.66666667 0.59152048 0.66789441 0.63753558]
mean value: 0.5836313947320002
key: train_mcc
value: [0.62153995 0.63685301 0.61201456 0.59876685 0.59876685 0.61017022
0.63706477 0.62017157 0.61512692 0.57557164]
mean value: 0.6126046335536118
key: test_accuracy
value: [0.76691729 0.81203008 0.68939394 0.78787879 0.78787879 0.78787879
0.83333333 0.79545455 0.83333333 0.81818182]
mean value: 0.7912280701754386
key: train_accuracy
value: [0.81076535 0.81833474 0.80588235 0.79915966 0.79915966 0.80504202
0.81848739 0.81008403 0.80756303 0.78739496]
mean value: 0.8061873193347987
key: test_fscore
value: [0.76691729 0.80620155 0.67716535 0.78461538 0.8 0.78125
0.83333333 0.8 0.83823529 0.8125 ]
mean value: 0.7900218210017753
key: train_fscore
value: [0.8104465 0.8202995 0.80306905 0.79520137 0.79520137 0.80338983
0.82 0.80976431 0.80740118 0.78170837]
mean value: 0.8046481487421849
key: test_precision
value: [0.76119403 0.83870968 0.70491803 0.796875 0.75675676 0.80645161
0.83333333 0.7826087 0.81428571 0.83870968]
mean value: 0.7933842530407546
key: train_precision
value: [0.8125 0.81085526 0.81487889 0.81118881 0.81118881 0.81025641
0.81322314 0.81112985 0.80808081 0.80319149]
mean value: 0.8106493474693212
key: test_recall
value: [0.77272727 0.7761194 0.65151515 0.77272727 0.84848485 0.75757576
0.83333333 0.81818182 0.86363636 0.78787879]
mean value: 0.7882180009045681
key: train_recall
value: [0.80840336 0.82996633 0.79159664 0.77983193 0.77983193 0.79663866
0.82689076 0.80840336 0.80672269 0.76134454]
mean value: 0.7989630195512548
key: test_roc_auc
value: [0.76696065 0.81230213 0.68939394 0.78787879 0.78787879 0.78787879
0.83333333 0.79545455 0.83333333 0.81818182]
mean value: 0.7912596110357304
key: train_roc_auc
value: [0.81076734 0.81834451 0.80588235 0.79915966 0.79915966 0.80504202
0.81848739 0.81008403 0.80756303 0.78739496]
mean value: 0.8061884956002603
key: test_jcc
value: [0.62195122 0.67532468 0.51190476 0.64556962 0.66666667 0.64102564
0.71428571 0.66666667 0.72151899 0.68421053]
mean value: 0.6549124479297047
key: train_jcc
value: [0.68130312 0.69534556 0.67094017 0.66002845 0.66002845 0.6713881
0.69491525 0.68033946 0.67700987 0.64164306]
mean value: 0.673294149450316
MCC on Blind test: 0.52
Accuracy on Blind test: 0.81
Model_name: K-Nearest Neighbors
Model func: KNeighborsClassifier()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', KNeighborsClassifier())])
key: fit_time
value: [0.02682781 0.01738381 0.01693344 0.0169661 0.01902151 0.03203535
0.01671958 0.03684998 0.03678107 0.02850962]
mean value: 0.02480282783508301
key: score_time
value: [0.05919313 0.04242587 0.04569626 0.04686856 0.07781005 0.05949831
0.05094695 0.0685482 0.0651741 0.06896615]
mean value: 0.058512759208679196
key: test_mcc
value: [0.64039914 0.61310775 0.48484848 0.62017367 0.56855832 0.66943868
0.65765844 0.64715023 0.5934603 0.60858062]
mean value: 0.6103375626732229
key: train_mcc
value: [0.74859568 0.7179162 0.73835124 0.7226 0.7328732 0.74237715
0.7368131 0.73393856 0.74262079 0.7275125 ]
mean value: 0.7343598397022401
key: test_accuracy
value: [0.81954887 0.80451128 0.74242424 0.8030303 0.78030303 0.83333333
0.82575758 0.81818182 0.78787879 0.8030303 ]
mean value: 0.8017999544315334
key: train_accuracy
value: [0.87132044 0.85534062 0.86638655 0.85714286 0.86302521 0.86806723
0.86554622 0.86302521 0.86722689 0.85966387]
mean value: 0.8636745093327491
key: test_fscore
value: [0.82352941 0.81690141 0.74242424 0.82191781 0.7972028 0.84057971
0.83687943 0.83333333 0.81081081 0.8115942 ]
mean value: 0.8135173157873363
key: train_fscore
value: [0.87905138 0.86477987 0.87410926 0.8671875 0.87175452 0.87608524
0.87341772 0.87235709 0.87636933 0.86942924]
mean value: 0.8724541163103992
key: test_precision
value: [0.8 0.77333333 0.74242424 0.75 0.74025974 0.80555556
0.78666667 0.76923077 0.73170732 0.77777778]
mean value: 0.7676955402321256
key: train_precision
value: [0.82985075 0.81120944 0.82634731 0.81021898 0.81952663 0.82589286
0.82511211 0.81671554 0.81991215 0.8128655 ]
mean value: 0.819765125314062
key: test_recall
value: [0.84848485 0.86567164 0.74242424 0.90909091 0.86363636 0.87878788
0.89393939 0.90909091 0.90909091 0.84848485]
mean value: 0.8668701944821348
key: train_recall
value: [0.93445378 0.92592593 0.92773109 0.93277311 0.93109244 0.93277311
0.92773109 0.93613445 0.94117647 0.93445378]
mean value: 0.9324245253657019
key: test_roc_auc
value: [0.81976481 0.80404794 0.74242424 0.8030303 0.78030303 0.83333333
0.82575758 0.81818182 0.78787879 0.8030303 ]
mean value: 0.8017752148349164
key: train_roc_auc
value: [0.87126729 0.85539994 0.86638655 0.85714286 0.86302521 0.86806723
0.86554622 0.86302521 0.86722689 0.85966387]
mean value: 0.8636751266163031
key: test_jcc
value: [0.7 0.69047619 0.59036145 0.69767442 0.6627907 0.725
0.7195122 0.71428571 0.68181818 0.68292683]
mean value: 0.6864845673032532
key: train_jcc
value: [0.7842031 0.76177285 0.77637131 0.76551724 0.77266388 0.77949438
0.7752809 0.77361111 0.77994429 0.76901798]
mean value: 0.7737877045149908
MCC on Blind test: 0.45
Accuracy on Blind test: 0.76
Model_name: SVM
Model func: SVC(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SVC(random_state=42))])
key: fit_time
value: [0.08626294 0.06776714 0.06693482 0.08487916 0.08456087 0.08712363
0.07345867 0.06638074 0.08891439 0.13808131]
mean value: 0.08443636894226074
key: score_time
value: [0.02497745 0.02349496 0.02379441 0.04615092 0.02772403 0.02862549
0.02358913 0.02368736 0.03861403 0.04184937]
mean value: 0.03025071620941162
key: test_mcc
value: [0.65616669 0.67021931 0.65339283 0.71319972 0.75295561 0.69825325
0.75295561 0.63002408 0.75295561 0.70511024]
mean value: 0.698523295201196
key: train_mcc
value: [0.72978573 0.74735252 0.73100739 0.73527645 0.72393206 0.7254279
0.73067537 0.74376299 0.72326251 0.73466098]
mean value: 0.732514389766043
key: test_accuracy
value: [0.82706767 0.83458647 0.82575758 0.84848485 0.87121212 0.84848485
0.87121212 0.81060606 0.87121212 0.84848485]
mean value: 0.8457108680792891
key: train_accuracy
value: [0.86206897 0.86963835 0.86218487 0.86470588 0.85966387 0.85966387
0.86218487 0.86806723 0.85882353 0.86470588]
mean value: 0.8631707317073171
key: test_fscore
value: [0.83211679 0.84057971 0.83211679 0.8630137 0.88111888 0.85294118
0.88111888 0.82517483 0.88111888 0.85915493]
mean value: 0.8548454559996922
key: train_fscore
value: [0.87025316 0.87843137 0.87086614 0.87272727 0.86714399 0.86819258
0.87066246 0.87686275 0.86708861 0.87232355]
mean value: 0.8714551892097665
key: test_precision
value: [0.8028169 0.81690141 0.8028169 0.7875 0.81818182 0.82857143
0.81818182 0.76623377 0.81818182 0.80263158]
mean value: 0.8062017439565624
key: train_precision
value: [0.82212257 0.82232012 0.81925926 0.8238806 0.82326284 0.81845238
0.82020802 0.82205882 0.81913303 0.82582583]
mean value: 0.8216523473090571
key: test_recall
value: [0.86363636 0.86567164 0.86363636 0.95454545 0.95454545 0.87878788
0.95454545 0.89393939 0.95454545 0.92424242]
mean value: 0.9108095884215287
key: train_recall
value: [0.92436975 0.94276094 0.92941176 0.92773109 0.91596639 0.92436975
0.92773109 0.9394958 0.9210084 0.92436975]
mean value: 0.9277214724273548
key: test_roc_auc
value: [0.82734057 0.83435097 0.82575758 0.84848485 0.87121212 0.84848485
0.87121212 0.81060606 0.87121212 0.84848485]
mean value: 0.8457146087743103
key: train_roc_auc
value: [0.86201652 0.8696998 0.86218487 0.86470588 0.85966387 0.85966387
0.86218487 0.86806723 0.85882353 0.86470588]
mean value: 0.8631716322892794
key: test_jcc
value: [0.7125 0.725 0.7125 0.75903614 0.7875 0.74358974
0.7875 0.70238095 0.7875 0.75308642]
mean value: 0.7470593260302095
key: train_jcc
value: [0.77030812 0.78321678 0.77126918 0.77419355 0.76544944 0.76708508
0.77094972 0.78072626 0.76536313 0.77355837]
mean value: 0.772211962153118
MCC on Blind test: 0.66
Accuracy on Blind test: 0.86
Model_name: MLP
Model func: MLPClassifier(max_iter=500, random_state=42)
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet.
warnings.warn(
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MLPClassifier(max_iter=500, random_state=42))])
key: fit_time
value: [7.32249093 5.55219841 8.19922113 6.65408945 1.15874195 8.76371288
5.59072351 5.6803503 5.23264861 5.51218748]
mean value: 5.96663646697998
key: score_time
value: [0.02065516 0.02101636 0.02099442 0.02098346 0.02147007 0.02328658
0.0287044 0.01435661 0.01431417 0.02102661]
mean value: 0.020680785179138184
key: test_mcc
value: [0.80538602 0.78986657 0.89404202 0.82773811 0.69728992 0.83806027
0.87177979 0.88531564 0.81060226 0.83806027]
mean value: 0.8258140871173387
key: train_mcc
value: [0.97814523 0.92478051 0.95837445 0.95185354 0.74159554 0.98319883
0.97324671 0.99160924 0.95858042 0.97816369]
mean value: 0.9439548161998859
key: test_accuracy
value: [0.90225564 0.89473684 0.9469697 0.90909091 0.84848485 0.91666667
0.93181818 0.93939394 0.90151515 0.91666667]
mean value: 0.9107598541809068
key: train_accuracy
value: [0.98906644 0.96215307 0.9789916 0.97563025 0.87058824 0.99159664
0.98655462 0.99579832 0.9789916 0.98907563]
mean value: 0.9718446402951424
key: test_fscore
value: [0.9037037 0.89393939 0.94736842 0.91549296 0.85074627 0.92086331
0.93617021 0.94285714 0.90780142 0.92086331]
mean value: 0.9139806137866777
key: train_fscore
value: [0.9891031 0.96271748 0.97928749 0.97605285 0.86837607 0.99161074
0.98666667 0.99580889 0.9793559 0.98904802]
mean value: 0.971802720420434
key: test_precision
value: [0.88405797 0.90769231 0.94029851 0.85526316 0.83823529 0.87671233
0.88 0.89189189 0.85333333 0.87671233]
mean value: 0.8804197120941343
key: train_precision
value: [0.98662207 0.94779772 0.96568627 0.95941558 0.88347826 0.98994975
0.9785124 0.99331104 0.96266234 0.99155405]
mean value: 0.9658989483467253
key: test_recall
value: [0.92424242 0.88059701 0.95454545 0.98484848 0.86363636 0.96969697
1. 1. 0.96969697 0.96969697]
mean value: 0.9516960651289009
key: train_recall
value: [0.99159664 0.97811448 0.99327731 0.99327731 0.85378151 0.99327731
0.99495798 0.99831933 0.99663866 0.98655462]
mean value: 0.9779795150383386
key: test_roc_auc
value: [0.90241972 0.89484396 0.9469697 0.90909091 0.84848485 0.91666667
0.93181818 0.93939394 0.90151515 0.91666667]
mean value: 0.91078697421981
key: train_roc_auc
value: [0.98906431 0.96216648 0.9789916 0.97563025 0.87058824 0.99159664
0.98655462 0.99579832 0.9789916 0.98907563]
mean value: 0.9718457686104744
key: test_jcc
value: [0.82432432 0.80821918 0.9 0.84415584 0.74025974 0.85333333
0.88 0.89189189 0.83116883 0.85333333]
mean value: 0.8426686476549491
key: train_jcc
value: [0.97844113 0.92811502 0.95941558 0.95322581 0.7673716 0.98336106
0.97368421 0.99165275 0.95954693 0.97833333]
mean value: 0.9473147424653781
MCC on Blind test: 0.66
Accuracy on Blind test: 0.87
Model_name: Decision Tree
Model func: DecisionTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', DecisionTreeClassifier(random_state=42))])
key: fit_time
value: [0.06732917 0.04838347 0.05075574 0.05332375 0.0462842 0.06115699
0.05194139 0.05070543 0.05754256 0.05003095]
mean value: 0.053745365142822264
key: score_time
value: [0.0100081 0.00959229 0.00984335 0.00956178 0.00954437 0.00966573
0.00966644 0.00967693 0.00957775 0.01000595]
mean value: 0.009714269638061523
key: test_mcc
value: [0.94028503 0.89560771 0.91287093 0.95553309 0.85839508 0.89901011
0.88040627 0.9701425 0.87177979 0.93939394]
mean value: 0.9123424440936554
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.96992481 0.94736842 0.95454545 0.97727273 0.92424242 0.9469697
0.93939394 0.98484848 0.93181818 0.96969697]
mean value: 0.9546081111870586
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.97014925 0.94890511 0.95652174 0.97777778 0.92957746 0.94964029
0.94117647 0.98507463 0.93617021 0.96969697]
mean value: 0.9564689912603958
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.95588235 0.92857143 0.91666667 0.95652174 0.86842105 0.90410959
0.91428571 0.97058824 0.88 0.96969697]
mean value: 0.9264743748259183
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98484848 0.97014925 1. 1. 1. 1.
0.96969697 1. 1. 0.96969697]
mean value: 0.9894391677973767
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.97003618 0.94719584 0.95454545 0.97727273 0.92424242 0.9469697
0.93939394 0.98484848 0.93181818 0.96969697]
mean value: 0.9546019900497512
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.94202899 0.90277778 0.91666667 0.95652174 0.86842105 0.90410959
0.88888889 0.97058824 0.88 0.94117647]
mean value: 0.9171179405526042
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.72
Accuracy on Blind test: 0.89
Model_name: Extra Trees
Model func: ExtraTreesClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreesClassifier(random_state=42))])
key: fit_time
value: [0.20921516 0.21479702 0.22188973 0.2206893 0.22368717 0.22040296
0.19971609 0.21791267 0.21999526 0.21728563]
mean value: 0.216559100151062
key: score_time
value: [0.02080321 0.0215764 0.01975822 0.02135944 0.02216983 0.02127004
0.0214529 0.02133965 0.02135205 0.02098799]
mean value: 0.021206974983215332
key: test_mcc
value: [0.86718264 0.89484396 0.90909091 0.91287093 0.92690611 0.83806027
0.89901011 0.92690611 0.9251987 0.90909091]
mean value: 0.9009160650602038
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.93233083 0.94736842 0.95454545 0.95454545 0.96212121 0.91666667
0.9469697 0.96212121 0.96212121 0.95454545]
mean value: 0.9493335611756665
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.93430657 0.94736842 0.95454545 0.95652174 0.96350365 0.92086331
0.94964029 0.96350365 0.96296296 0.95454545]
mean value: 0.9507761497972379
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.90140845 0.95454545 0.95454545 0.91666667 0.92957746 0.87671233
0.90410959 0.92957746 0.94202899 0.95454545]
mean value: 0.9263717313900186
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.96969697 0.94029851 0.95454545 1. 1. 0.96969697
1. 1. 0.98484848 0.95454545]
mean value: 0.977363184079602
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.93260968 0.94742198 0.95454545 0.95454545 0.96212121 0.91666667
0.9469697 0.96212121 0.96212121 0.95454545]
mean value: 0.949366802351877
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.87671233 0.9 0.91304348 0.91666667 0.92957746 0.85333333
0.90410959 0.92957746 0.92857143 0.91304348]
mean value: 0.9064635232478852
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.49
Accuracy on Blind test: 0.81
Model_name: Extra Tree
Model func: ExtraTreeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', ExtraTreeClassifier(random_state=42))])
key: fit_time
value: [0.01371717 0.01392555 0.01349926 0.01353741 0.01367569 0.01371884
0.01378632 0.01376367 0.01417041 0.01406479]
mean value: 0.013785910606384278
key: score_time
value: [0.00944543 0.01027775 0.00953388 0.00994778 0.00947404 0.00971413
0.00948572 0.00943255 0.00957513 0.0105927 ]
mean value: 0.009747910499572753
key: test_mcc
value: [0.71569714 0.8253812 0.82158384 0.86853519 0.8824419 0.82425939
0.83205029 0.92690611 0.81442137 0.86612538]
mean value: 0.8377401802212678
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.84962406 0.90977444 0.90909091 0.93181818 0.93939394 0.90909091
0.90909091 0.96212121 0.90151515 0.93181818]
mean value: 0.9153337890179996
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.8630137 0.91549296 0.91304348 0.9352518 0.94202899 0.91428571
0.91666667 0.96350365 0.90909091 0.93430657]
mean value: 0.9206684427727275
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.7875 0.86666667 0.875 0.89041096 0.90277778 0.86486486
0.84615385 0.92957746 0.84415584 0.90140845]
mean value: 0.8708515874016067
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.95454545 0.97014925 0.95454545 0.98484848 0.98484848 0.96969697
1. 1. 0.98484848 0.96969697]
mean value: 0.9773179556761646
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.85040706 0.90931705 0.90909091 0.93181818 0.93939394 0.90909091
0.90909091 0.96212121 0.90151515 0.93181818]
mean value: 0.9153663500678426
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.75903614 0.84415584 0.84 0.87837838 0.89041096 0.84210526
0.84615385 0.92957746 0.83333333 0.87671233]
mean value: 0.8539863562217576
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.38
Accuracy on Blind test: 0.76
Model_name: Random Forest
Model func: RandomForestClassifier(n_estimators=1000, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(n_estimators=1000, random_state=42))])
key: fit_time
value: [3.85031939 3.69451547 3.61622024 3.522048 3.53350186 3.76046658
3.82019472 3.82504964 3.79067063 3.809582 ]
mean value: 3.7222568511962892
key: score_time
value: [0.11130786 0.16276884 0.1515305 0.1129837 0.11014009 0.1188767
0.11364388 0.11909842 0.1191082 0.11907864]
mean value: 0.12385368347167969
key: test_mcc
value: [0.89732778 0.94028503 0.9251987 0.92690611 0.89901011 0.89651574
0.92690611 0.94112395 0.86853519 0.9251987 ]
mean value: 0.9147007416365505
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.96992481 0.96212121 0.96212121 0.9469697 0.9469697
0.96212121 0.96969697 0.93181818 0.96212121]
mean value: 0.95612326270221
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94890511 0.96969697 0.96296296 0.96350365 0.94964029 0.94890511
0.96350365 0.97058824 0.9352518 0.96296296]
mean value: 0.9575920735496123
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91549296 0.98461538 0.94202899 0.92957746 0.90410959 0.91549296
0.92957746 0.94285714 0.89041096 0.94202899]
mean value: 0.9296191891502648
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98484848 0.95522388 0.98484848 1. 1. 0.98484848
1. 1. 0.98484848 0.98484848]
mean value: 0.9879466304839439
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94764812 0.97003618 0.96212121 0.96212121 0.9469697 0.9469697
0.96212121 0.96969697 0.93181818 0.96212121]
mean value: 0.9561623699683401
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90277778 0.94117647 0.92857143 0.92957746 0.90410959 0.90277778
0.92957746 0.94285714 0.87837838 0.92857143]
mean value: 0.918837492314073
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.74
Accuracy on Blind test: 0.9
Model_name: Random Forest2
Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers.
warn(
Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...05', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10,
oob_score=True, random_state=42))])
key: fit_time
value: [1.25330877 2.00703502 2.64860749 1.26994395 1.31967926 1.25739765
1.30199599 2.39054012 3.48491287 3.10396361]
mean value: 2.0037384748458864
key: score_time
value: [0.20351052 0.23993278 0.24853969 0.14206243 0.18108511 0.20508718
0.20811629 0.25352764 0.27486992 0.16690373]
mean value: 0.21236352920532225
key: test_mcc
value: [0.85319305 0.89484396 0.89404202 0.92690611 0.91287093 0.86452993
0.88040627 0.9251987 0.85478752 0.90950859]
mean value: 0.8916287091253218
key: train_mcc
value: [0.95517845 0.95193369 0.94656078 0.95185354 0.94537267 0.95164901
0.93839978 0.94873699 0.95347998 0.94524429]
mean value: 0.9488409181794549
key: test_accuracy
value: [0.92481203 0.94736842 0.9469697 0.96212121 0.95454545 0.93181818
0.93939394 0.96212121 0.92424242 0.95454545]
mean value: 0.9447938026885395
key: train_accuracy
value: [0.97729184 0.97560976 0.97310924 0.97563025 0.97226891 0.97563025
0.96890756 0.97394958 0.97647059 0.97226891]
mean value: 0.9741136892099144
key: test_fscore
value: [0.92753623 0.94736842 0.94736842 0.96350365 0.95652174 0.93333333
0.94117647 0.96296296 0.92857143 0.95522388]
mean value: 0.9463566538807767
key: train_fscore
value: [0.97770438 0.97605285 0.973466 0.97605285 0.97283951 0.97597349
0.96944674 0.9744856 0.9768595 0.97279472]
mean value: 0.974567563469322
key: test_precision
value: [0.88888889 0.95454545 0.94029851 0.92957746 0.91666667 0.91304348
0.91428571 0.94202899 0.87837838 0.94117647]
mean value: 0.9218890009372873
key: train_precision
value: [0.96103896 0.95786062 0.96072013 0.95941558 0.95322581 0.9624183
0.95292208 0.95483871 0.96097561 0.95469256]
mean value: 0.9578108353365855
key: test_recall
value: [0.96969697 0.94029851 0.95454545 1. 1. 0.95454545
0.96969697 0.98484848 0.98484848 0.96969697]
mean value: 0.9728177295341475
key: train_recall
value: [0.99495798 0.99494949 0.98655462 0.99327731 0.99327731 0.98991597
0.98655462 0.99495798 0.99327731 0.99159664]
mean value: 0.9919319242848654
key: test_roc_auc
value: [0.92514699 0.94742198 0.9469697 0.96212121 0.95454545 0.93181818
0.93939394 0.96212121 0.92424242 0.95454545]
mean value: 0.9448326549072817
key: train_roc_auc
value: [0.97727697 0.97562601 0.97310924 0.97563025 0.97226891 0.97563025
0.96890756 0.97394958 0.97647059 0.97226891]
mean value: 0.9741138273491214
key: test_jcc
value: [0.86486486 0.9 0.9 0.92957746 0.91666667 0.875
0.88888889 0.92857143 0.86666667 0.91428571]
mean value: 0.8984521694732962
key: train_jcc
value: [0.95638126 0.95322581 0.94830372 0.95322581 0.94711538 0.95307443
0.94070513 0.95024077 0.95476575 0.9470305 ]
mean value: 0.950406855441748
MCC on Blind test: 0.79
Accuracy on Blind test: 0.92
Model_name: Naive Bayes
Model func: BernoulliNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', BernoulliNB())])
key: fit_time
value: [0.03859448 0.04442239 0.0445168 0.04431796 0.03685474 0.04462409
0.04459906 0.04071379 0.04456615 0.04441738]
mean value: 0.04276268482208252
key: score_time
value: [0.02326918 0.02230549 0.0216465 0.02467608 0.02369523 0.02307796
0.02220964 0.02150774 0.02098942 0.02424097]
mean value: 0.022761821746826172
key: test_mcc
value: [0.5339213 0.62602155 0.37987955 0.57602211 0.58003439 0.5768179
0.66666667 0.59152048 0.66789441 0.63753558]
mean value: 0.5836313947320002
key: train_mcc
value: [0.62153995 0.63685301 0.61201456 0.59876685 0.59876685 0.61017022
0.63706477 0.62017157 0.61512692 0.57557164]
mean value: 0.6126046335536118
key: test_accuracy
value: [0.76691729 0.81203008 0.68939394 0.78787879 0.78787879 0.78787879
0.83333333 0.79545455 0.83333333 0.81818182]
mean value: 0.7912280701754386
key: train_accuracy
value: [0.81076535 0.81833474 0.80588235 0.79915966 0.79915966 0.80504202
0.81848739 0.81008403 0.80756303 0.78739496]
mean value: 0.8061873193347987
key: test_fscore
value: [0.76691729 0.80620155 0.67716535 0.78461538 0.8 0.78125
0.83333333 0.8 0.83823529 0.8125 ]
mean value: 0.7900218210017753
key: train_fscore
value: [0.8104465 0.8202995 0.80306905 0.79520137 0.79520137 0.80338983
0.82 0.80976431 0.80740118 0.78170837]
mean value: 0.8046481487421849
key: test_precision
value: [0.76119403 0.83870968 0.70491803 0.796875 0.75675676 0.80645161
0.83333333 0.7826087 0.81428571 0.83870968]
mean value: 0.7933842530407546
key: train_precision
value: [0.8125 0.81085526 0.81487889 0.81118881 0.81118881 0.81025641
0.81322314 0.81112985 0.80808081 0.80319149]
mean value: 0.8106493474693212
key: test_recall
value: [0.77272727 0.7761194 0.65151515 0.77272727 0.84848485 0.75757576
0.83333333 0.81818182 0.86363636 0.78787879]
mean value: 0.7882180009045681
key: train_recall
value: [0.80840336 0.82996633 0.79159664 0.77983193 0.77983193 0.79663866
0.82689076 0.80840336 0.80672269 0.76134454]
mean value: 0.7989630195512548
key: test_roc_auc
value: [0.76696065 0.81230213 0.68939394 0.78787879 0.78787879 0.78787879
0.83333333 0.79545455 0.83333333 0.81818182]
mean value: 0.7912596110357304
key: train_roc_auc
value: [0.81076734 0.81834451 0.80588235 0.79915966 0.79915966 0.80504202
0.81848739 0.81008403 0.80756303 0.78739496]
mean value: 0.8061884956002603
key: test_jcc
value: [0.62195122 0.67532468 0.51190476 0.64556962 0.66666667 0.64102564
0.71428571 0.66666667 0.72151899 0.68421053]
mean value: 0.6549124479297047
key: train_jcc
value: [0.68130312 0.69534556 0.67094017 0.66002845 0.66002845 0.6713881
0.69491525 0.68033946 0.67700987 0.64164306]
mean value: 0.673294149450316
MCC on Blind test: 0.52
Accuracy on Blind test: 0.81
Model_name: XGBoost
Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
colsample_bynode=None, colsample_bytree=None,
enable_categorical=False, gamma=None, gpu_id=None,
importance_type=None, interaction_constraints=None,
learning_rate=None, max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=100, n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None, reg_lambda=None,
scale_pos_weight=None, subsample=None, tree_method=None,
use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC0...
interaction_constraints=None, learning_rate=None,
max_delta_step=None, max_depth=None,
min_child_weight=None, missing=nan,
monotone_constraints=None, n_estimators=100,
n_jobs=None, num_parallel_tree=None,
predictor=None, random_state=42, reg_alpha=None,
reg_lambda=None, scale_pos_weight=None,
subsample=None, tree_method=None,
use_label_encoder=False,
validate_parameters=None, verbosity=0))])
key: fit_time
value: [9.06408787 7.39492488 7.35578918 2.32716751 8.98293686 8.69696665
8.82733345 8.63794541 8.75473666 8.13571477]
mean value: 7.817760324478149
key: score_time
value: [0.02943587 0.03138995 0.01382542 0.01349759 0.02445674 0.03002214
0.02363658 0.02672243 0.02811027 0.02797961]
mean value: 0.024907660484313966
key: test_mcc
value: [0.89732778 0.93984622 0.98496155 0.95553309 0.91287093 0.91287093
0.91287093 0.94112395 0.86853519 0.89486432]
mean value: 0.9220804884879608
key: train_mcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_accuracy
value: [0.94736842 0.96992481 0.99242424 0.97727273 0.95454545 0.95454545
0.95454545 0.96969697 0.93181818 0.9469697 ]
mean value: 0.9599111414900888
key: train_accuracy
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_fscore
value: [0.94890511 0.97014925 0.9924812 0.97777778 0.95652174 0.95652174
0.95652174 0.97058824 0.9352518 0.94814815]
mean value: 0.9612866743400412
key: train_fscore
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_precision
value: [0.91549296 0.97014925 0.98507463 0.95652174 0.91666667 0.91666667
0.91666667 0.94285714 0.89041096 0.92753623]
mean value: 0.9338042911119239
key: train_precision
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_recall
value: [0.98484848 0.97014925 1. 1. 1. 1.
1. 1. 0.98484848 0.96969697]
mean value: 0.9909543193125283
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.94764812 0.96992311 0.99242424 0.97727273 0.95454545 0.95454545
0.95454545 0.96969697 0.93181818 0.9469697 ]
mean value: 0.9599389416553595
key: train_roc_auc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_jcc
value: [0.90277778 0.94202899 0.98507463 0.95652174 0.91666667 0.91666667
0.91666667 0.94285714 0.87837838 0.90140845]
mean value: 0.9259047101220877
key: train_jcc
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
MCC on Blind test: 0.76
Accuracy on Blind test: 0.91
Model_name: LDA
Model func: LinearDiscriminantAnalysis()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', LinearDiscriminantAnalysis())])
key: fit_time
value: [0.07274532 0.08438516 0.08188748 0.10827613 0.10006595 0.09745407
0.07904243 0.09105849 0.09434795 0.0698843 ]
mean value: 0.08791472911834716
key: score_time
value: [0.02005672 0.01286626 0.0128572 0.02031779 0.02774405 0.01311469
0.01296258 0.02588487 0.01283789 0.01285267]
mean value: 0.017149472236633302
key: test_mcc
value: [0.65489945 0.68499676 0.71220297 0.73576721 0.78086881 0.68378319
0.72760688 0.67161876 0.74456392 0.65765844]
mean value: 0.7053966384637252
key: train_mcc
value: [0.82394775 0.80531684 0.80123327 0.79450021 0.79161831 0.79274396
0.78937774 0.78455181 0.79802051 0.79422122]
mean value: 0.7975531631116414
key: test_accuracy
value: [0.82706767 0.84210526 0.85606061 0.86363636 0.87878788 0.84090909
0.86363636 0.83333333 0.87121212 0.82575758]
mean value: 0.850250626566416
key: train_accuracy
value: [0.91084945 0.90159798 0.9 0.89663866 0.89495798 0.89579832
0.89411765 0.89159664 0.89831933 0.89663866]
mean value: 0.8980514661709933
key: test_fscore
value: [0.82962963 0.83969466 0.85714286 0.87323944 0.89189189 0.84671533
0.86567164 0.84285714 0.87591241 0.83687943]
mean value: 0.8559634426271225
key: train_fscore
value: [0.91410049 0.90495532 0.90269828 0.89942764 0.89829129 0.898527
0.89689034 0.89469388 0.90122449 0.89909762]
mean value: 0.9009906357661513
key: test_precision
value: [0.8115942 0.859375 0.85074627 0.81578947 0.80487805 0.81690141
0.85294118 0.7972973 0.84507042 0.78666667]
mean value: 0.8241259965440433
key: train_precision
value: [0.88262911 0.8744113 0.87898089 0.87579618 0.87066246 0.87559809
0.87400319 0.86984127 0.87619048 0.87820513]
mean value: 0.875631809174941
key: test_recall
value: [0.84848485 0.82089552 0.86363636 0.93939394 1. 0.87878788
0.87878788 0.89393939 0.90909091 0.89393939]
mean value: 0.8926956128448665
key: train_recall
value: [0.94789916 0.93771044 0.92773109 0.92436975 0.92773109 0.92268908
0.9210084 0.9210084 0.92773109 0.9210084 ]
mean value: 0.9278886908298674
key: test_roc_auc
value: [0.8272275 0.84226594 0.85606061 0.86363636 0.87878788 0.84090909
0.86363636 0.83333333 0.87121212 0.82575758]
mean value: 0.8502826775214836
key: train_roc_auc
value: [0.91081827 0.90162833 0.9 0.89663866 0.89495798 0.89579832
0.89411765 0.89159664 0.89831933 0.89663866]
mean value: 0.8980513821690292
key: test_jcc
value: [0.70886076 0.72368421 0.75 0.775 0.80487805 0.73417722
0.76315789 0.72839506 0.77922078 0.7195122 ]
mean value: 0.7486886164798315
key: train_jcc
value: [0.84179104 0.8264095 0.82265276 0.81723626 0.81536189 0.81575037
0.81305638 0.80945347 0.82020802 0.81669151]
mean value: 0.8198611195150052
MCC on Blind test: 0.63
Accuracy on Blind test: 0.85
Model_name: Multinomial
Model func: MultinomialNB()
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', MultinomialNB())])
key: fit_time
value: [0.01525617 0.01732731 0.01737404 0.01752138 0.01738 0.01749825
0.01730013 0.01765752 0.01737523 0.01737618]
mean value: 0.017206621170043946
key: score_time
value: [0.01277661 0.01270509 0.01268816 0.01266551 0.01260066 0.01256776
0.01265335 0.01260543 0.01266003 0.01263642]
mean value: 0.012655901908874511
key: test_mcc
value: [0.56389958 0.58604879 0.36430604 0.51610023 0.5611861 0.563786
0.63665602 0.5153882 0.63900965 0.68378319]
mean value: 0.5630163814123827
key: train_mcc
value: [0.57686697 0.59485056 0.561523 0.55675563 0.56322889 0.58015978
0.55998031 0.58695598 0.57201782 0.55821108]
mean value: 0.5710550001904885
key: test_accuracy
value: [0.78195489 0.78947368 0.68181818 0.75757576 0.78030303 0.78030303
0.81818182 0.75757576 0.81818182 0.84090909]
mean value: 0.7806277056277057
key: train_accuracy
value: [0.78805719 0.79730866 0.78067227 0.77815126 0.78151261 0.78991597
0.77983193 0.79327731 0.78571429 0.7789916 ]
mean value: 0.7853433080549292
key: test_fscore
value: [0.77862595 0.77419355 0.671875 0.75 0.78518519 0.768
0.81538462 0.75384615 0.82608696 0.83464567]
mean value: 0.7757843082814602
key: train_fscore
value: [0.78275862 0.794193 0.77787234 0.77358491 0.77853492 0.78632479
0.77606838 0.78938356 0.78073947 0.77578858]
mean value: 0.7815248554785705
key: test_precision
value: [0.78461538 0.84210526 0.69354839 0.77419355 0.76811594 0.81355932
0.828125 0.765625 0.79166667 0.86885246]
mean value: 0.7930406973003095
key: train_precision
value: [0.80353982 0.80589255 0.78793103 0.78984238 0.78929188 0.8
0.78956522 0.80453752 0.79929577 0.78719723]
mean value: 0.7957093415182501
key: test_recall
value: [0.77272727 0.71641791 0.65151515 0.72727273 0.8030303 0.72727273
0.8030303 0.74242424 0.86363636 0.8030303 ]
mean value: 0.7610357304387155
key: train_recall
value: [0.76302521 0.78282828 0.76806723 0.75798319 0.76806723 0.77310924
0.76302521 0.77478992 0.76302521 0.76470588]
mean value: 0.7678626602156013
key: test_roc_auc
value: [0.78188602 0.79002714 0.68181818 0.75757576 0.78030303 0.78030303
0.81818182 0.75757576 0.81818182 0.84090909]
mean value: 0.7806761646313886
key: train_roc_auc
value: [0.78807826 0.79729649 0.78067227 0.77815126 0.78151261 0.78991597
0.77983193 0.79327731 0.78571429 0.7789916 ]
mean value: 0.7853441982853748
key: test_jcc
value: [0.6375 0.63157895 0.50588235 0.6 0.64634146 0.62337662
0.68831169 0.60493827 0.7037037 0.71621622]
mean value: 0.6357849266937401
key: train_jcc
value: [0.64305949 0.65864023 0.63649025 0.63076923 0.63737796 0.64788732
0.63407821 0.65205092 0.6403385 0.63370474]
mean value: 0.6414396857841679
MCC on Blind test: 0.53
Accuracy on Blind test: 0.81
Model_name: Passive Aggresive
Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
PassiveAggressiveClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.04524255 0.03324056 0.04321265 0.0383203 0.03985834 0.04182839
0.03704095 0.03285527 0.06639671 0.03593302]
mean value: 0.0413928747177124
key: score_time
value: [0.01156163 0.01269126 0.01272488 0.01269293 0.01273298 0.01271439
0.01266551 0.01269531 0.02094913 0.01287222]
mean value: 0.013430023193359375
key: test_mcc
value: [0.68499676 0.59541983 0.71285802 0.44226898 0.55401326 0.5934603
0.73125738 0.47628967 0.74663552 0.62994079]
mean value: 0.6167140502093844
key: train_mcc
value: [0.76897638 0.71112018 0.75153227 0.55906289 0.53120474 0.69572774
0.71330134 0.49351182 0.75024237 0.67439964]
mean value: 0.6649079378118509
key: test_accuracy
value: [0.84210526 0.79699248 0.85606061 0.6969697 0.73484848 0.78787879
0.85606061 0.71212121 0.87121212 0.8030303 ]
mean value: 0.7957279562542721
key: train_accuracy
value: [0.88393608 0.84693019 0.87563025 0.74537815 0.7210084 0.83193277
0.84789916 0.70336134 0.87478992 0.82689076]
mean value: 0.8157757030482504
key: test_fscore
value: [0.84444444 0.8057554 0.85271318 0.60784314 0.79041916 0.81081081
0.8707483 0.62745098 0.864 0.77192982]
mean value: 0.7846115232438119
key: train_fscore
value: [0.88707038 0.86191199 0.87728027 0.66519337 0.78157895 0.85380117
0.86298259 0.58616647 0.872103 0.80268199]
mean value: 0.805077017361187
key: test_precision
value: [0.82608696 0.77777778 0.87301587 0.86111111 0.65346535 0.73170732
0.79012346 0.88888889 0.91525424 0.91666667]
mean value: 0.823409763166814
key: train_precision
value: [0.86443381 0.78453039 0.86579378 0.97096774 0.64216216 0.75549806
0.78512397 0.96899225 0.89122807 0.93318486]
mean value: 0.8461915083249473
key: test_recall
value: [0.86363636 0.8358209 0.83333333 0.46969697 1. 0.90909091
0.96969697 0.48484848 0.81818182 0.66666667]
mean value: 0.7850972410673903
key: train_recall
value: [0.91092437 0.95622896 0.88907563 0.50588235 0.99831933 0.98151261
0.95798319 0.42016807 0.85378151 0.70420168]
mean value: 0.8178077695724755
key: test_roc_auc
value: [0.84226594 0.79669833 0.85606061 0.6969697 0.73484848 0.78787879
0.85606061 0.71212121 0.87121212 0.8030303 ]
mean value: 0.7957146087743103
key: train_roc_auc
value: [0.88391336 0.84702204 0.87563025 0.74537815 0.7210084 0.83193277
0.84789916 0.70336134 0.87478992 0.82689076]
mean value: 0.8157826160767337
key: test_jcc
value: [0.73076923 0.6746988 0.74324324 0.43661972 0.65346535 0.68181818
0.77108434 0.45714286 0.76056338 0.62857143]
mean value: 0.6537976519201265
key: train_jcc
value: [0.79705882 0.75733333 0.78138848 0.49834437 0.64146868 0.74489796
0.75898802 0.4145937 0.77321157 0.6704 ]
mean value: 0.6837684929881324
MCC on Blind test: 0.41
Accuracy on Blind test: 0.79
Model_name: Stochastic GDescent
Model func: SGDClassifier(n_jobs=10, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', SGDClassifier(n_jobs=10, random_state=42))])
key: fit_time
value: [0.04435778 0.07317758 0.05246305 0.03744841 0.05032897 0.07695699
0.04261231 0.04731846 0.0365212 0.0385685 ]
mean value: 0.04997532367706299
key: score_time
value: [0.01271534 0.01193547 0.01296759 0.012784 0.02681756 0.01191354
0.01280212 0.01282024 0.02049041 0.01277399]
mean value: 0.014802026748657226
key: test_mcc
value: [0.51226807 0.62415197 0.69544219 0.73029674 0.47140452 0.63362511
0.62407827 0.55301004 0.45742763 0.5934603 ]
mean value: 0.5895164834865797
key: train_mcc
value: [0.61208229 0.80123118 0.71239844 0.7731977 0.48521622 0.72253313
0.63418446 0.60364035 0.53398461 0.63871402]
mean value: 0.6517182402152969
key: test_accuracy
value: [0.72932331 0.81203008 0.84090909 0.86363636 0.68181818 0.81060606
0.78030303 0.75757576 0.68939394 0.78787879]
mean value: 0.7753474595579859
key: train_accuracy
value: [0.78301093 0.8999159 0.84453782 0.88655462 0.69159664 0.85294118
0.78823529 0.7789916 0.73109244 0.8 ]
mean value: 0.8056876409100225
key: test_fscore
value: [0.64705882 0.81203008 0.85517241 0.86956522 0.75862069 0.78991597
0.81987578 0.7037037 0.56842105 0.75862069]
mean value: 0.7582984408331488
key: train_fscore
value: [0.73236515 0.90269828 0.86204325 0.88569009 0.76398714 0.83537159
0.82475661 0.72689512 0.64125561 0.75862069]
mean value: 0.793368352154183
key: test_precision
value: [0.91666667 0.81818182 0.78481013 0.83333333 0.61111111 0.88679245
0.69473684 0.9047619 0.93103448 0.88 ]
mean value: 0.8261428738331185
key: train_precision
value: [0.95663957 0.87758347 0.77479893 0.89249147 0.61875 0.94871795
0.70344009 0.95108696 0.96296296 0.95652174]
mean value: 0.8642993129637412
key: test_recall
value: [0.5 0.80597015 0.93939394 0.90909091 1. 0.71212121
1. 0.57575758 0.40909091 0.66666667]
mean value: 0.7518091361374943
key: train_recall
value: [0.59327731 0.92929293 0.97142857 0.8789916 0.99831933 0.74621849
0.99663866 0.58823529 0.48067227 0.62857143]
mean value: 0.78116458704694
key: test_roc_auc
value: [0.72761194 0.81207598 0.84090909 0.86363636 0.68181818 0.81060606
0.78030303 0.75757576 0.68939394 0.78787879]
mean value: 0.7751809136137494
key: train_roc_auc
value: [0.78317064 0.89994058 0.84453782 0.88655462 0.69159664 0.85294118
0.78823529 0.7789916 0.73109244 0.8 ]
mean value: 0.8057060804119628
key: test_jcc
value: [0.47826087 0.6835443 0.74698795 0.76923077 0.61111111 0.65277778
0.69473684 0.54285714 0.39705882 0.61111111]
mean value: 0.6187676702892502
key: train_jcc
value: [0.57774141 0.82265276 0.75753604 0.79483283 0.61810614 0.71728595
0.70177515 0.57096248 0.47194719 0.61111111]
mean value: 0.6643951051173903
MCC on Blind test: 0.6
Accuracy on Blind test: 0.85
Model_name: AdaBoost Classifier
Model func: AdaBoostClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', AdaBoostClassifier(random_state=42))])
key: fit_time
value: [0.37129021 0.35578799 0.35703659 0.34674764 0.33202577 0.35261488
0.33393955 0.33623791 0.33627582 0.33462644]
mean value: 0.345658278465271
key: score_time
value: [0.01794934 0.01799059 0.01849675 0.01664996 0.01663804 0.01741576
0.01670289 0.0176537 0.017905 0.01648688]
mean value: 0.01738889217376709
key: test_mcc
value: [0.86718264 0.88011764 0.86373551 0.89651574 0.82773811 0.8196886
0.81855773 0.833429 0.85839508 0.833429 ]
mean value: 0.8498789066655591
key: train_mcc
value: [0.91988743 0.9180489 0.9179388 0.9179388 0.92357126 0.90120825
0.93138846 0.91138916 0.92810999 0.928425 ]
mean value: 0.9197906059991515
key: test_accuracy
value: [0.93233083 0.93984962 0.93181818 0.9469697 0.90909091 0.90909091
0.90909091 0.91666667 0.92424242 0.91666667]
mean value: 0.9235816814764183
key: train_accuracy
value: [0.95962994 0.9587889 0.95882353 0.95882353 0.96134454 0.95042017
0.96554622 0.95546218 0.96386555 0.96386555]
mean value: 0.9596570099865009
key: test_fscore
value: [0.93430657 0.93939394 0.93233083 0.94890511 0.91549296 0.91176471
0.91044776 0.91729323 0.92957746 0.91729323]
mean value: 0.9256805801070733
key: train_fscore
value: [0.96039604 0.95940348 0.9593361 0.9593361 0.96217105 0.95111848
0.9659751 0.95616212 0.96437448 0.9645507 ]
mean value: 0.9602823650782725
key: test_precision
value: [0.90140845 0.95384615 0.92537313 0.91549296 0.85526316 0.88571429
0.89705882 0.91044776 0.86842105 0.91044776]
mean value: 0.9023473538783289
key: train_precision
value: [0.94327391 0.94453507 0.94754098 0.94754098 0.94202899 0.9379085
0.95409836 0.94136808 0.95098039 0.94660194]
mean value: 0.9455877201594677
key: test_recall
value: [0.96969697 0.92537313 0.93939394 0.98484848 0.98484848 0.93939394
0.92424242 0.92424242 1. 0.92424242]
mean value: 0.9516282225237449
key: train_recall
value: [0.97815126 0.97474747 0.97142857 0.97142857 0.98319328 0.96470588
0.97815126 0.97142857 0.97815126 0.98319328]
mean value: 0.9754579407520584
key: test_roc_auc
value: [0.93260968 0.93995929 0.93181818 0.9469697 0.90909091 0.90909091
0.90909091 0.91666667 0.92424242 0.91666667]
mean value: 0.9236205336951606
key: train_roc_auc
value: [0.95961435 0.95880231 0.95882353 0.95882353 0.96134454 0.95042017
0.96554622 0.95546218 0.96386555 0.96386555]
mean value: 0.9596567920097332
key: test_jcc
value: [0.87671233 0.88571429 0.87323944 0.90277778 0.84415584 0.83783784
0.83561644 0.84722222 0.86842105 0.84722222]
mean value: 0.8618919446304775
key: train_jcc
value: [0.92380952 0.92197452 0.92185008 0.92185008 0.92709984 0.90679305
0.93418941 0.91600634 0.9312 0.93152866]
mean value: 0.9236301503750806
MCC on Blind test: 0.8
Accuracy on Blind test: 0.92
Model_name: Bagging Classifier
Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates.
warn(
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide
oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model',
BaggingClassifier(n_jobs=10, oob_score=True,
random_state=42))])
key: fit_time
value: [0.12730455 0.2520771 0.24463916 0.23580003 0.25089359 0.23320365
0.24741626 0.24575734 0.2396307 0.13593245]
mean value: 0.22126548290252684
key: score_time
value: [0.03831649 0.03454685 0.03999615 0.03003001 0.03436136 0.04432249
0.03896332 0.03376579 0.03437304 0.03488183]
mean value: 0.03635573387145996
key: test_mcc
value: [0.85953823 0.89484396 0.91076511 0.92690611 0.85478752 0.85478752
0.94112395 0.92690611 0.87177979 0.89651574]
mean value: 0.8937954025367763
key: train_mcc
value: [0.99495514 0.99495514 0.99495939 0.99495939 0.99663866 0.99327731
0.99160924 0.99663866 0.99664429 0.99497063]
mean value: 0.9949607836874935
key: test_accuracy
value: [0.92481203 0.94736842 0.95454545 0.96212121 0.92424242 0.92424242
0.96969697 0.96212121 0.93181818 0.9469697 ]
mean value: 0.9447938026885395
key: train_accuracy
value: [0.99747687 0.99747687 0.99747899 0.99747899 0.99831933 0.99663866
0.99579832 0.99831933 0.99831933 0.99747899]
mean value: 0.9974785675413984
key: test_fscore
value: [0.92957746 0.94736842 0.95588235 0.96350365 0.92857143 0.92857143
0.97058824 0.96350365 0.93617021 0.94890511]
mean value: 0.9472641952744596
key: train_fscore
value: [0.99748111 0.99747262 0.99748111 0.99748111 0.99831933 0.99663866
0.99580889 0.99831933 0.99832215 0.99748533]
mean value: 0.9974809619824477
key: test_precision
value: [0.86842105 0.95454545 0.92857143 0.92957746 0.87837838 0.87837838
0.94285714 0.92957746 0.88 0.91549296]
mean value: 0.9105799722686305
key: train_precision
value: [0.9966443 0.99831366 0.9966443 0.9966443 0.99831933 0.99663866
0.99331104 0.99831933 0.99664992 0.99498328]
mean value: 0.9966468086818777
key: test_recall
value: [1. 0.94029851 0.98484848 1. 0.98484848 0.98484848
1. 1. 1. 0.98484848]
mean value: 0.9879692446856626
key: train_recall
value: [0.99831933 0.996633 0.99831933 0.99831933 0.99831933 0.99663866
0.99831933 0.99831933 1. 1. ]
mean value: 0.9983187618481736
key: test_roc_auc
value: [0.92537313 0.94742198 0.95454545 0.96212121 0.92424242 0.92424242
0.96969697 0.96212121 0.93181818 0.9469697 ]
mean value: 0.9448552691090004
key: train_roc_auc
value: [0.99747616 0.99747616 0.99747899 0.99747899 0.99831933 0.99663866
0.99579832 0.99831933 0.99831933 0.99747899]
mean value: 0.9974784257137198
key: test_jcc
value: [0.86842105 0.9 0.91549296 0.92957746 0.86666667 0.86666667
0.94285714 0.92957746 0.88 0.90277778]
mean value: 0.9002037193923776
key: train_jcc
value: [0.99497487 0.99495798 0.99497487 0.99497487 0.9966443 0.99329983
0.99165275 0.9966443 0.99664992 0.99498328]
mean value: 0.9949756977839559
MCC on Blind test: 0.77
Accuracy on Blind test: 0.91
Model_name: Gaussian Process
Model func: GaussianProcessClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GaussianProcessClassifier(random_state=42))])
key: fit_time
value: [0.76625323 0.93301606 0.85965228 0.68952823 0.66378808 0.87883234
0.7050457 0.75222564 0.88421106 0.79904914]
mean value: 0.7931601762771606
key: score_time
value: [0.05368519 0.07283998 0.06703472 0.05444479 0.05356479 0.05195475
0.0593667 0.05376959 0.05021644 0.02867389]
mean value: 0.054555082321166994
key: test_mcc
value: [0.82301362 0.79255617 0.75792383 0.78368849 0.80758535 0.7800135
0.83806027 0.81442137 0.80123362 0.82425939]
mean value: 0.8022755606617251
key: train_mcc
value: [0.95680613 0.96343671 0.95828776 0.958472 0.95359324 0.96658315
0.96157408 0.96183506 0.96346619 0.96682907]
mean value: 0.9610883387492701
key: test_accuracy
value: [0.90977444 0.89473684 0.87878788 0.88636364 0.90151515 0.88636364
0.91666667 0.90151515 0.89393939 0.90909091]
mean value: 0.8978753702437913
key: train_accuracy
value: [0.97813288 0.98149706 0.9789916 0.9789916 0.97647059 0.98319328
0.98067227 0.98067227 0.98151261 0.98319328]
mean value: 0.9803327420118594
key: test_fscore
value: [0.91304348 0.9 0.88059701 0.8951049 0.90647482 0.89361702
0.92086331 0.90909091 0.90277778 0.91428571]
mean value: 0.9035854940218537
key: train_fscore
value: [0.9785124 0.98175788 0.97925311 0.97932175 0.97689769 0.98336106
0.98088113 0.98097601 0.98178808 0.98344371]
mean value: 0.9806192826004415
key: test_precision
value: [0.875 0.8630137 0.86764706 0.83116883 0.8630137 0.84
0.87671233 0.84415584 0.83333333 0.86486486]
mean value: 0.85589096583738
key: train_precision
value: [0.96260163 0.96732026 0.96721311 0.96416938 0.95948136 0.97364086
0.97039474 0.96579805 0.96737357 0.96900489]
mean value: 0.9666997850416796
key: test_recall
value: [0.95454545 0.94029851 0.89393939 0.96969697 0.95454545 0.95454545
0.96969697 0.98484848 0.98484848 0.96969697]
mean value: 0.9576662143826323
key: train_recall
value: [0.99495798 0.996633 0.99159664 0.99495798 0.99495798 0.99327731
0.99159664 0.99663866 0.99663866 0.99831933]
mean value: 0.9949574173103585
key: test_roc_auc
value: [0.91010855 0.89439168 0.87878788 0.88636364 0.90151515 0.88636364
0.91666667 0.90151515 0.89393939 0.90909091]
mean value: 0.8978742650384441
key: train_roc_auc
value: [0.97811872 0.98150978 0.9789916 0.9789916 0.97647059 0.98319328
0.98067227 0.98067227 0.98151261 0.98319328]
mean value: 0.9803325976855389
key: test_jcc
value: [0.84 0.81818182 0.78666667 0.81012658 0.82894737 0.80769231
0.85333333 0.83333333 0.82278481 0.84210526]
mean value: 0.824317148319147
key: train_jcc
value: [0.9579288 0.96416938 0.95934959 0.95948136 0.95483871 0.96726678
0.96247961 0.96266234 0.96422764 0.96742671]
mean value: 0.9619830922592865
MCC on Blind test: 0.59
Accuracy on Blind test: 0.84
Model_name: Gradient Boosting
Model func: GradientBoostingClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', GradientBoostingClassifier(random_state=42))])
key: fit_time
value: [1.54636669 1.56205773 1.50813985 1.55183625 1.58534145 1.55194783
1.54392004 1.56743145 1.5406394 1.5221889 ]
mean value: 1.5479869604110719
key: score_time
value: [0.01014566 0.00983214 0.00978208 0.01077819 0.01102424 0.00981069
0.01092172 0.01045537 0.01038098 0.00968528]
mean value: 0.010281634330749512
key: test_mcc
value: [0.869585 0.89484396 0.92690611 0.91287093 0.88531564 0.89901011
0.91287093 0.8824419 0.88531564 0.91076511]
mean value: 0.8979925323640788
key: train_mcc
value: [0.97672119 0.9683493 0.97666924 0.97009871 0.97009871 0.97001645
0.97495654 0.97166052 0.97666924 0.97173741]
mean value: 0.9726977310015105
key: test_accuracy
value: [0.93233083 0.94736842 0.96212121 0.95454545 0.93939394 0.9469697
0.95454545 0.93939394 0.93939394 0.95454545]
mean value: 0.9470608339029392
key: train_accuracy
value: [0.9882254 0.98402019 0.98823529 0.98487395 0.98487395 0.98487395
0.98739496 0.98571429 0.98823529 0.98571429]
mean value: 0.9862161550911366
key: test_fscore
value: [0.9352518 0.94736842 0.96350365 0.95652174 0.94285714 0.94964029
0.95652174 0.94202899 0.94285714 0.95588235]
mean value: 0.9492433259442181
key: train_fscore
value: [0.98837209 0.98420615 0.98835275 0.98507463 0.98507463 0.98504983
0.98751041 0.98586866 0.98835275 0.98589212]
mean value: 0.986375400863372
key: test_precision
value: [0.89041096 0.95454545 0.92957746 0.91666667 0.89189189 0.90410959
0.91666667 0.90277778 0.89189189 0.92857143]
mean value: 0.9127109790745715
key: train_precision
value: [0.97701149 0.97208539 0.9785832 0.97217676 0.97217676 0.97372742
0.97854785 0.97532895 0.9785832 0.97377049]
mean value: 0.9751991507005686
key: test_recall
value: [0.98484848 0.94029851 1. 1. 1. 1.
1. 0.98484848 1. 0.98484848]
mean value: 0.9894843962008141
key: train_recall
value: [1. 0.996633 0.99831933 0.99831933 0.99831933 0.99663866
0.99663866 0.99663866 0.99831933 0.99831933]
mean value: 0.9978145601675014
key: test_roc_auc
value: [0.93272275 0.94742198 0.96212121 0.95454545 0.93939394 0.9469697
0.95454545 0.93939394 0.93939394 0.95454545]
mean value: 0.9471053821800091
key: train_roc_auc
value: [0.98821549 0.98403078 0.98823529 0.98487395 0.98487395 0.98487395
0.98739496 0.98571429 0.98823529 0.98571429]
mean value: 0.9862162238632827
key: test_jcc
value: [0.87837838 0.9 0.92957746 0.91666667 0.89189189 0.90410959
0.91666667 0.89041096 0.89189189 0.91549296]
mean value: 0.9035086465975912
key: train_jcc
value: [0.97701149 0.96890344 0.97697368 0.97058824 0.97058824 0.9705401
0.97532895 0.97213115 0.97697368 0.97217676]
mean value: 0.9731215722770584
MCC on Blind test: 0.78
Accuracy on Blind test: 0.91
Model_name: QDA
Model func: QuadraticDiscriminantAnalysis()
List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear
warnings.warn("Variables are collinear")
[('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', QuadraticDiscriminantAnalysis())])
key: fit_time
value: [0.07663488 0.0694983 0.05319428 0.04964781 0.07832623 0.0508635
0.04934716 0.07398272 0.073277 0.05942822]
mean value: 0.0634200096130371
key: score_time
value: [0.02299166 0.01402068 0.01389337 0.01395512 0.01391602 0.01397276
0.02293348 0.02380395 0.01953149 0.01425648]
mean value: 0.017327499389648438
key: test_mcc
value: [0.23393668 0.21899752 0.34444748 0.25400025 0.1717795 0.25400025
0.23664319 0.23664319 0.2466911 0.19682713]
mean value: 0.23939663017861
key: train_mcc
value: [0.25597326 0.26474332 0.29790362 0.2611946 0.26484691 0.25750387
0.26665917 0.26665917 0.2611946 0.25750387]
mean value: 0.26541823911621043
key: test_accuracy
value: [0.54887218 0.54887218 0.60606061 0.56060606 0.54545455 0.56060606
0.5530303 0.5530303 0.56818182 0.56060606]
mean value: 0.5605320118478013
key: train_accuracy
value: [0.56181665 0.56518082 0.58151261 0.56386555 0.56554622 0.56218487
0.56638655 0.56638655 0.56386555 0.56218487]
mean value: 0.5658930249980564
key: test_fscore
value: [0.6875 0.69072165 0.7173913 0.69473684 0.68085106 0.69473684
0.69109948 0.69109948 0.69518717 0.68478261]
mean value: 0.692810642922331
key: train_fscore
value: [0.69549971 0.69677419 0.7049763 0.69631363 0.69712947 0.69549971
0.6975381 0.6975381 0.69631363 0.69549971]
mean value: 0.6973082556135722
key: test_precision
value: [0.52380952 0.52755906 0.55932203 0.53225806 0.52459016 0.53225806
0.528 0.528 0.53719008 0.53389831]
mean value: 0.5326885293521997
key: train_precision
value: [0.53315412 0.53465347 0.54437328 0.53411131 0.53507194 0.53315412
0.53555356 0.53555356 0.53411131 0.53315412]
mean value: 0.5352890789817935
key: test_recall
value: [1. 1. 1. 1. 0.96969697 1.
1. 1. 0.98484848 0.95454545]
mean value: 0.990909090909091
key: train_recall
value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
mean value: 1.0
key: test_roc_auc
value: [0.55223881 0.54545455 0.60606061 0.56060606 0.54545455 0.56060606
0.5530303 0.5530303 0.56818182 0.56060606]
mean value: 0.5605269109000452
key: train_roc_auc
value: [0.56144781 0.56554622 0.58151261 0.56386555 0.56554622 0.56218487
0.56638655 0.56638655 0.56386555 0.56218487]
mean value: 0.565892680304445
key: test_jcc
value: [0.52380952 0.52755906 0.55932203 0.53225806 0.51612903 0.53225806
0.528 0.528 0.53278689 0.52066116]
mean value: 0.5300783816386957
key: train_jcc
value: [0.53315412 0.53465347 0.54437328 0.53411131 0.53507194 0.53315412
0.53555356 0.53555356 0.53411131 0.53315412]
mean value: 0.5352890789817935
MCC on Blind test: 0.12
Accuracy on Blind test: 0.39
Model_name: Ridge Classifier
Model func: RidgeClassifier(random_state=42)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifier(random_state=42))])
key: fit_time
value: [0.06939888 0.05537415 0.05750942 0.05315638 0.05297065 0.04990363
0.04437757 0.03693891 0.05415535 0.05993485]
mean value: 0.053371977806091306
key: score_time
value: [0.02014852 0.02306843 0.0202558 0.02007866 0.02018118 0.02053142
0.02035832 0.022403 0.02546763 0.02311039]
mean value: 0.02156033515930176
key: test_mcc
value: [0.68430574 0.64039914 0.74250948 0.78368849 0.75725927 0.71285802
0.75897093 0.67161876 0.76072577 0.74663552]
mean value: 0.7258971118039523
key: train_mcc
value: [0.77315466 0.78527719 0.77672743 0.78455181 0.7648432 0.78328462
0.77766758 0.7771158 0.7797431 0.77335768]
mean value: 0.7775723077305859
key: test_accuracy
value: [0.84210526 0.81954887 0.87121212 0.88636364 0.87121212 0.85606061
0.87878788 0.83333333 0.87878788 0.87121212]
mean value: 0.8608623832308043
key: train_accuracy
value: [0.88561817 0.89150547 0.88739496 0.89159664 0.88151261 0.8907563
0.88823529 0.88739496 0.88907563 0.88571429]
mean value: 0.8878804305574206
key: test_fscore
value: [0.84210526 0.81538462 0.87022901 0.8951049 0.88275862 0.85925926
0.88235294 0.84285714 0.88405797 0.87769784]
mean value: 0.8651807558004633
key: train_fscore
value: [0.88961039 0.89537713 0.89123377 0.89469388 0.88545898 0.89430894
0.89125102 0.89158576 0.89250814 0.88961039]
mean value: 0.891563839740782
key: test_precision
value: [0.8358209 0.84126984 0.87692308 0.83116883 0.81012658 0.84057971
0.85714286 0.7972973 0.84722222 0.83561644]
mean value: 0.8373167752326087
key: train_precision
value: [0.86028257 0.86384977 0.86185243 0.86984127 0.85691824 0.86614173
0.86783439 0.85959438 0.8657188 0.86028257]
mean value: 0.8632316166842141
key: test_recall
value: [0.84848485 0.79104478 0.86363636 0.96969697 0.96969697 0.87878788
0.90909091 0.89393939 0.92424242 0.92424242]
mean value: 0.8972862957937585
key: train_recall
value: [0.9210084 0.92929293 0.92268908 0.9210084 0.91596639 0.92436975
0.91596639 0.92605042 0.9210084 0.9210084 ]
mean value: 0.921836855954503
key: test_roc_auc
value: [0.84215287 0.81976481 0.87121212 0.88636364 0.87121212 0.85606061
0.87878788 0.83333333 0.87878788 0.87121212]
mean value: 0.8608887381275441
key: train_roc_auc
value: [0.88558838 0.89153722 0.88739496 0.89159664 0.88151261 0.8907563
0.88823529 0.88739496 0.88907563 0.88571429]
mean value: 0.887880626998274
key: test_jcc
value: [0.72727273 0.68831169 0.77027027 0.81012658 0.79012346 0.75324675
0.78947368 0.72839506 0.79220779 0.78205128]
mean value: 0.7631479298368039
key: train_jcc
value: [0.80116959 0.81057269 0.80380673 0.80945347 0.79446064 0.80882353
0.80383481 0.80437956 0.80588235 0.80116959]
mean value: 0.8043552968756095
MCC on Blind test: 0.65
Accuracy on Blind test: 0.86
Model_name: Ridge ClassifierCV
Model func: RidgeClassifierCV(cv=10)
List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5,
n_estimators=1000, n_jobs=10, oob_score=True,
random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, predictor='auto', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', use_label_encoder=False,
validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]
Running model pipeline: Pipeline(steps=[('prep',
ColumnTransformer(remainder='passthrough',
transformers=[('num', MinMaxScaler(),
Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change',
'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts',
'mcsm_na_affinity', 'mcsm_ppi2_affinity',
...
'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101',
'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'],
dtype='object', length=169)),
('cat', OneHotEncoder(),
Index(['ss_class', 'aa_prop_change', 'electrostatics_change',
'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'],
dtype='object'))])),
('model', RidgeClassifierCV(cv=10))])
key: fit_time
value: [0.6347847 0.57954597 0.46806121 0.50969672 0.58284903 0.64407444
0.56478858 0.61213517 0.66929221 0.59609604]
mean value: 0.5861324071884155
key: score_time
value: [0.02030373 0.02027869 0.02028942 0.02028108 0.02024817 0.02016139
0.02032113 0.02037835 0.02039146 0.02017832]
mean value: 0.02028317451477051
key: test_mcc
value: [0.68430574 0.64039914 0.72760688 0.78368849 0.75725927 0.71285802
0.71285802 0.67161876 0.76072577 0.74663552]
mean value: 0.7197955608042741
key: train_mcc
value: [0.77315466 0.78527719 0.78968204 0.78455181 0.7648432 0.78328462
0.78762435 0.7771158 0.7797431 0.77335768]
mean value: 0.7798634449591065
key: test_accuracy
value: [0.84210526 0.81954887 0.86363636 0.88636364 0.87121212 0.85606061
0.85606061 0.83333333 0.87878788 0.87121212]
mean value: 0.8578320802005013
key: train_accuracy
value: [0.88561817 0.89150547 0.89411765 0.89159664 0.88151261 0.8907563
0.89327731 0.88739496 0.88907563 0.88571429]
mean value: 0.8890569011456559
key: test_fscore
value: [0.84210526 0.81538462 0.86153846 0.8951049 0.88275862 0.85925926
0.85925926 0.84285714 0.88405797 0.87769784]
mean value: 0.8620023329992295
key: train_fscore
value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:196: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True)
/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_8020.py:199: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True)
[0.88961039 0.89537713 0.89722675 0.89469388 0.88545898 0.89430894
0.8959869 0.89158576 0.89250814 0.88961039]
mean value: 0.8926367258754563
key: test_precision
value: [0.8358209 0.84126984 0.875 0.83116883 0.81012658 0.84057971
0.84057971 0.7972973 0.84722222 0.83561644]
mean value: 0.835468152840508
key: train_precision
value: [0.86028257 0.86384977 0.87163233 0.86984127 0.85691824 0.86614173
0.87380192 0.85959438 0.8657188 0.86028257]
mean value: 0.8648063585225085
key: test_recall
value: [0.84848485 0.79104478 0.84848485 0.96969697 0.96969697 0.87878788
0.87878788 0.89393939 0.92424242 0.92424242]
mean value: 0.8927408412483039
key: train_recall
value: [0.9210084 0.92929293 0.92436975 0.9210084 0.91596639 0.92436975
0.91932773 0.92605042 0.9210084 0.9210084 ]
mean value: 0.9223410576351753
key: test_roc_auc
value: [0.84215287 0.81976481 0.86363636 0.88636364 0.87121212 0.85606061
0.85606061 0.83333333 0.87878788 0.87121212]
mean value: 0.8578584350972411
key: train_roc_auc
value: [0.88558838 0.89153722 0.89411765 0.89159664 0.88151261 0.8907563
0.89327731 0.88739496 0.88907563 0.88571429]
mean value: 0.8890570975865093
key: test_jcc
value: [0.72727273 0.68831169 0.75675676 0.81012658 0.79012346 0.75324675
0.75324675 0.72839506 0.79220779 0.78205128]
mean value: 0.7581738853890753
key: train_jcc
value: [0.80116959 0.81057269 0.81360947 0.80945347 0.79446064 0.80882353
0.8115727 0.80437956 0.80588235 0.80116959]
mean value: 0.8061093593256186
MCC on Blind test: 0.65
Accuracy on Blind test: 0.86