/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_7030.py:548: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 1133 PASS: my_features_df and aa_df successfully combined nrows: 1133 ncols: 274 count of NULL values before imputation or_mychisq 339 log10_or_mychisq 339 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 169 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 176 ------------------------------------------------------------- Successfully split data with stratification [COMPLETE data]: 70/30 Original data size: (1132, 176) Train data size: (758, 176) Test data size: (374, 176) y_train numbers: Counter({0: 554, 1: 204}) y_train ratio: 2.715686274509804 y_test_numbers: Counter({0: 273, 1: 101}) y_test ratio: 2.702970297029703 ------------------------------------------------------------- index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: True index: 2 ind: 3 Mask count check: True Original Data Counter({0: 554, 1: 204}) Data dim: (758, 176) Simple Random OverSampling Counter({0: 554, 1: 554}) (1108, 176) Simple Random UnderSampling Counter({0: 204, 1: 204}) (408, 176) Simple Combined Over and UnderSampling Counter({0: 554, 1: 554}) (1108, 176) SMOTE_NC OverSampling Counter({0: 554, 1: 554}) (1108, 176) ##################################################################### Running ML analysis [COMPLETE DATA]: 70/30 split Gene name: rpoB Drug name: rifampicin Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_cd_7030/ Sanity checks: Total input features: 176 Training data size: (758, 176) Test data size: (374, 176) Target feature numbers (training data): Counter({0: 554, 1: 204}) Target features ratio (training data: 2.715686274509804 Target feature numbers (test data): Counter({0: 273, 1: 101}) Target features ratio (test data): 2.702970297029703 ##################################################################### ================================================================ Strucutral features (n): 37 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04495931 0.05544949 0.04286551 0.04531431 0.03928089 0.03883028 0.04265618 0.03910923 0.03948498 0.03936028] mean value: 0.04273104667663574 key: score_time value: [0.0126586 0.01351953 0.01310349 0.01323962 0.01216793 0.01326966 0.01327348 0.01291776 0.013165 0.01312828] mean value: 0.013044333457946778 key: test_mcc value: [0.48304589 0.63304195 0.52205834 0.60684923 0.64368314 0.28501393 0.52663543 0.56622086 0.68863504 0.56917479] mean value: 0.5524358605438721 key: train_mcc value: [0.71022006 0.6979828 0.6967624 0.70601791 0.69141924 0.7044883 0.72431165 0.72934313 0.70196615 0.68425989] mean value: 0.7046771545445295 key: test_accuracy value: [0.80263158 0.85526316 0.82894737 0.84210526 0.85526316 0.73684211 0.81578947 0.82894737 0.88 0.84 ] mean value: 0.8285789473684211 key: train_accuracy value: [0.88709677 0.88416422 0.88269795 0.8856305 0.88123167 0.88416422 0.89296188 0.89442815 0.88433382 0.87847731] mean value: 0.8855186493948124 key: test_fscore value: [0.61538462 0.73170732 0.60606061 0.71428571 0.74418605 0.44444444 0.65 0.68292683 0.76923077 0.66666667] mean value: 0.6624893008925907 key: train_fscore value: [0.7867036 0.77363897 0.7752809 0.78333333 0.77053824 0.78356164 0.79665738 0.80110497 0.77994429 0.76487252] mean value: 0.7815635854192167 key: test_precision value: [0.63157895 0.71428571 0.76923077 0.68181818 0.72727273 0.53333333 0.68421053 0.7 0.78947368 0.75 ] mean value: 0.6981203883835463 key: train_precision value: [0.80225989 0.81818182 0.80232558 0.80113636 0.8 0.78571429 0.8125 0.81005587 0.8 0.79881657] mean value: 0.8030990369902591 key: test_recall value: [0.6 0.75 0.5 0.75 0.76190476 0.38095238 0.61904762 0.66666667 0.75 0.6 ] mean value: 0.6378571428571429 key: train_recall value: [0.77173913 0.73369565 0.75 0.76630435 0.7431694 0.78142077 0.78142077 0.79234973 0.76086957 0.73369565] mean value: 0.7614665003563792 key: test_roc_auc value: [0.7375 0.82142857 0.72321429 0.8125 0.82640693 0.62683983 0.75497835 0.77878788 0.83863636 0.76363636] mean value: 0.7683928571428571 key: train_roc_auc value: [0.850729 0.83672734 0.84086345 0.84801161 0.83751656 0.85163223 0.85764425 0.86210673 0.84536464 0.83277969] mean value: 0.8463375511496104 key: test_jcc value: [0.44444444 0.57692308 0.43478261 0.55555556 0.59259259 0.28571429 0.48148148 0.51851852 0.625 0.5 ] mean value: 0.5015012563925607 key: train_jcc value: [0.64840183 0.63084112 0.63302752 0.64383562 0.62672811 0.64414414 0.66203704 0.66820276 0.63926941 0.61926606] mean value: 0.6415753605549265 MCC on Blind test: 0.59 Accuracy on Blind test: 0.84 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.11514831 1.08928394 1.08557916 0.98781013 1.01957607 0.89390969 0.96477437 1.0153501 0.90592122 1.03056788] mean value: 1.010792088508606 key: score_time value: [0.01500988 0.01334858 0.01334858 0.01227927 0.01526356 0.01327825 0.01342034 0.01259232 0.0133462 0.01332021] mean value: 0.01352071762084961 key: test_mcc value: [0.51048128 0.621059 0.79080467 0.71205323 0.64368314 0.35894245 0.52663543 0.53939394 0.6983799 0.59090909] mean value: 0.5992342136550841 key: train_mcc value: [0.83684028 0.77396514 0.82137245 0.76171102 0.77379327 0.76434019 0.76695008 0.77518908 0.78917499 0.81581794] mean value: 0.7879154432429085 key: test_accuracy value: [0.81578947 0.85526316 0.92105263 0.88157895 0.85526316 0.76315789 0.81578947 0.81578947 0.88 0.84 ] mean value: 0.8443684210526315 key: train_accuracy value: [0.93548387 0.91202346 0.92961877 0.90762463 0.91202346 0.90762463 0.90909091 0.91202346 0.91800878 0.92825769] mean value: 0.9171779667930425 key: test_fscore value: [0.63157895 0.71794872 0.83333333 0.79069767 0.74418605 0.5 0.65 0.66666667 0.7804878 0.7 ] mean value: 0.701489919112542 key: train_fscore value: [0.88108108 0.83333333 0.86956522 0.82352941 0.83333333 0.82739726 0.82872928 0.83516484 0.84444444 0.86426593] mean value: 0.8440844126532805 key: test_precision value: [0.66666667 0.73684211 0.9375 0.73913043 0.72727273 0.6 0.68421053 0.66666667 0.76190476 0.7 ] mean value: 0.7220193888872378 key: train_precision value: [0.87634409 0.85227273 0.86956522 0.84971098 0.84745763 0.82967033 0.83798883 0.83977901 0.86363636 0.88135593] mean value: 0.8547781098313728 key: test_recall value: [0.6 0.7 0.75 0.85 0.76190476 0.42857143 0.61904762 0.66666667 0.8 0.7 ] mean value: 0.6876190476190476 key: train_recall value: [0.88586957 0.81521739 0.86956522 0.79891304 0.81967213 0.82513661 0.81967213 0.83060109 0.82608696 0.84782609] mean value: 0.833856022808268 key: test_roc_auc value: [0.74642857 0.80535714 0.86607143 0.87142857 0.82640693 0.65974026 0.75497835 0.76969697 0.85454545 0.79545455] mean value: 0.7950108225108226 key: train_roc_auc value: [0.91984241 0.88150428 0.91068622 0.8733521 0.88278196 0.88150618 0.88077795 0.88624243 0.88899538 0.90287096] mean value: 0.8908559878389313 key: test_jcc value: [0.46153846 0.56 0.71428571 0.65384615 0.59259259 0.33333333 0.48148148 0.5 0.64 0.53846154] mean value: 0.5475539275539275 key: train_jcc value: [0.78743961 0.71428571 0.76923077 0.7 0.71428571 0.70560748 0.70754717 0.71698113 0.73076923 0.76097561] mean value: 0.7307122430376403 MCC on Blind test: 0.63 Accuracy on Blind test: 0.86 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01584315 0.01114631 0.01075149 0.01200366 0.01082635 0.01082635 0.01086259 0.01103044 0.01095486 0.01122904] mean value: 0.011547422409057618 key: score_time value: [0.00997329 0.00975752 0.00908828 0.00989461 0.00902319 0.00905466 0.00912499 0.00912809 0.00912166 0.00911331] mean value: 0.009327960014343262 key: test_mcc value: [0.38785122 0.36539907 0.43046947 0.43046947 0.34199134 0.42559698 0.34376305 0.38523946 0.6048462 0.71055169] mean value: 0.44261779449865774 key: train_mcc value: [0.50060055 0.53153318 0.48417992 0.49334101 0.49196558 0.50838861 0.4936543 0.46999696 0.50410801 0.4705211 ] mean value: 0.49482892039129556 key: test_accuracy value: [0.73684211 0.73684211 0.75 0.75 0.73684211 0.76315789 0.69736842 0.71052632 0.84 0.88 ] mean value: 0.7601578947368421 key: train_accuracy value: [0.78592375 0.80058651 0.77859238 0.78152493 0.78152493 0.7888563 0.7888563 0.74633431 0.79062958 0.77306003] mean value: 0.7815889018174949 key: test_fscore value: [0.56521739 0.54545455 0.59574468 0.59574468 0.52380952 0.59090909 0.54901961 0.57692308 0.71428571 0.79069767] mean value: 0.6047805986650169 key: train_fscore value: [0.64563107 0.66666667 0.63438257 0.64096386 0.63922518 0.65048544 0.63819095 0.62634989 0.64691358 0.62469734] mean value: 0.6413506538717907 key: test_precision value: [0.5 0.5 0.51851852 0.51851852 0.52380952 0.56521739 0.46666667 0.48387097 0.68181818 0.73913043] mean value: 0.5497550203160301 key: train_precision value: [0.58333333 0.60714286 0.5720524 0.57575758 0.57391304 0.58515284 0.59069767 0.51785714 0.59276018 0.56331878] mean value: 0.5761985825450499 key: test_recall value: [0.65 0.6 0.7 0.7 0.52380952 0.61904762 0.66666667 0.71428571 0.75 0.85 ] mean value: 0.6773809523809524 key: train_recall value: [0.72282609 0.73913043 0.71195652 0.72282609 0.72131148 0.73224044 0.69398907 0.79234973 0.71195652 0.70108696] mean value: 0.7249673319078166 key: test_roc_auc value: [0.70892857 0.69285714 0.73392857 0.73392857 0.67099567 0.71861472 0.68787879 0.71168831 0.81136364 0.87045455] mean value: 0.7340638528138528 key: train_roc_auc value: [0.76603152 0.7812118 0.75758469 0.76301947 0.76245934 0.77092984 0.75881818 0.76090432 0.7657979 0.75034308] mean value: 0.7637100142327954 key: test_jcc value: [0.39393939 0.375 0.42424242 0.42424242 0.35483871 0.41935484 0.37837838 0.40540541 0.55555556 0.65384615] mean value: 0.43848032839968326 key: train_jcc value: [0.47670251 0.5 0.46453901 0.47163121 0.46975089 0.48201439 0.46863469 0.45597484 0.47810219 0.45422535] mean value: 0.4721575070903312 MCC on Blind test: 0.41 Accuracy on Blind test: 0.75 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01117373 0.01130366 0.01122308 0.0110395 0.01203394 0.01106477 0.01097441 0.01187778 0.01183271 0.01108146] mean value: 0.011360502243041993 key: score_time value: [0.00908971 0.0091331 0.00910974 0.00899458 0.00902104 0.00903058 0.00908709 0.00987816 0.00898623 0.00970459] mean value: 0.009203481674194335 key: test_mcc value: [0.42968224 0.45714286 0.53968028 0.54096275 0.43257867 0.38416102 0.52663543 0.37442392 0.52764485 0.4955746 ] mean value: 0.4708486628265047 key: train_mcc value: [0.54291625 0.53559377 0.53542123 0.56534397 0.53524411 0.55345132 0.5311225 0.54923819 0.54615584 0.52355185] mean value: 0.5418039021364769 key: test_accuracy value: [0.76315789 0.78947368 0.82894737 0.81578947 0.77631579 0.75 0.81578947 0.76315789 0.82666667 0.81333333] mean value: 0.7942631578947369 key: train_accuracy value: [0.82111437 0.82111437 0.81818182 0.82844575 0.81964809 0.82404692 0.81524927 0.82111437 0.82284041 0.81551977] mean value: 0.8207275131707192 key: test_fscore value: [0.59090909 0.6 0.64864865 0.66666667 0.58536585 0.55813953 0.65 0.52631579 0.62857143 0.61111111] mean value: 0.6065728123922888 key: train_fscore value: [0.66483516 0.65536723 0.65934066 0.68292683 0.65738162 0.67391304 0.6576087 0.67204301 0.66666667 0.64804469] mean value: 0.663812760996864 key: test_precision value: [0.54166667 0.6 0.70588235 0.63636364 0.6 0.54545455 0.68421053 0.58823529 0.73333333 0.6875 ] mean value: 0.6322646355192795 key: train_precision value: [0.67222222 0.68235294 0.66666667 0.68108108 0.67045455 0.67027027 0.65405405 0.66137566 0.67597765 0.66666667] mean value: 0.6701121762598923 key: test_recall value: [0.65 0.6 0.6 0.7 0.57142857 0.57142857 0.61904762 0.47619048 0.55 0.55 ] mean value: 0.5888095238095238 key: train_recall value: [0.6576087 0.63043478 0.65217391 0.68478261 0.64480874 0.67759563 0.66120219 0.68306011 0.6576087 0.63043478] mean value: 0.6579710144927536 key: test_roc_auc value: [0.72678571 0.72857143 0.75535714 0.77857143 0.71298701 0.69480519 0.75497835 0.67445887 0.73863636 0.72954545] mean value: 0.729469696969697 key: train_roc_auc value: [0.7695674 0.76100052 0.76584599 0.78315436 0.76428814 0.77767557 0.76647284 0.7774018 0.77068812 0.75710116] mean value: 0.7693195890646318 key: test_jcc value: [0.41935484 0.42857143 0.48 0.5 0.4137931 0.38709677 0.48148148 0.35714286 0.45833333 0.44 ] mean value: 0.4365773816880602 key: train_jcc value: [0.49794239 0.48739496 0.49180328 0.51851852 0.48962656 0.50819672 0.48987854 0.50607287 0.5 0.47933884] mean value: 0.496877267932884 MCC on Blind test: 0.47 Accuracy on Blind test: 0.8 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01400113 0.01252866 0.0119729 0.01145363 0.01141572 0.01111579 0.01188993 0.01115608 0.01220417 0.01183653] mean value: 0.011957454681396484 key: score_time value: [0.08364654 0.01389098 0.01768208 0.01454377 0.01357603 0.01330209 0.01355171 0.01357627 0.0139761 0.0141139 ] mean value: 0.021185946464538575 key: test_mcc value: [0.18448201 0.47873298 0.36335261 0.44019762 0.28501393 0.23769831 0.3688017 0.3688017 0.48492277 0.37787109] mean value: 0.3589874726318814 key: train_mcc value: [0.55081621 0.55081621 0.53338855 0.5865119 0.52088082 0.55352855 0.56690744 0.5531166 0.5917446 0.5373988 ] mean value: 0.5545109680698154 key: test_accuracy value: [0.73684211 0.81578947 0.77631579 0.78947368 0.73684211 0.72368421 0.77631579 0.77631579 0.81333333 0.78666667] mean value: 0.773157894736842 key: train_accuracy value: [0.83577713 0.83577713 0.82991202 0.84750733 0.82697947 0.8372434 0.84164223 0.8372434 0.84919473 0.83162518] mean value: 0.8372902023589219 key: test_fscore value: [0.28571429 0.5625 0.48484848 0.57894737 0.44444444 0.4 0.4516129 0.4516129 0.58823529 0.38461538] mean value: 0.46325310686129123 key: train_fscore value: [0.61111111 0.61111111 0.61073826 0.65100671 0.58741259 0.62626263 0.63758389 0.6185567 0.64604811 0.60750853] mean value: 0.620733963837761 key: test_precision value: [0.5 0.75 0.61538462 0.61111111 0.53333333 0.5 0.7 0.7 0.71428571 0.83333333] mean value: 0.6457448107448107 key: train_precision value: [0.84615385 0.84615385 0.79824561 0.85087719 0.81553398 0.81578947 0.82608696 0.83333333 0.87850467 0.81651376] mean value: 0.832719267781213 key: test_recall value: [0.2 0.45 0.4 0.55 0.38095238 0.33333333 0.33333333 0.33333333 0.5 0.25 ] mean value: 0.3730952380952381 key: train_recall value: [0.47826087 0.47826087 0.49456522 0.52717391 0.45901639 0.50819672 0.51912568 0.49180328 0.51086957 0.48369565] mean value: 0.4950968163459254 key: test_roc_auc value: [0.56428571 0.69821429 0.65535714 0.7125 0.62683983 0.6030303 0.63939394 0.63939394 0.71363636 0.61590909] mean value: 0.6468560606060606 key: train_roc_auc value: [0.72306618 0.72306618 0.72419024 0.74651868 0.71047012 0.73305628 0.73952276 0.72786557 0.74240873 0.72180775] mean value: 0.7291972480213341 key: test_jcc value: [0.16666667 0.39130435 0.32 0.40740741 0.28571429 0.25 0.29166667 0.29166667 0.41666667 0.23809524] mean value: 0.3059187945709685 key: train_jcc value: [0.44 0.44 0.43961353 0.48258706 0.41584158 0.45588235 0.4679803 0.44776119 0.47715736 0.43627451] mean value: 0.45030978881526235 MCC on Blind test: 0.34 Accuracy on Blind test: 0.77 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.03588939 0.03526592 0.03247428 0.03409171 0.03312564 0.03497338 0.03531742 0.02957702 0.03257608 0.03178263] mean value: 0.033507347106933594 key: score_time value: [0.01449561 0.01631856 0.01578045 0.01557374 0.01439095 0.01470923 0.01556349 0.01594019 0.0141511 0.01432633] mean value: 0.015124964714050292 key: test_mcc value: [0.55205245 0.45439995 0.52947472 0.55205245 0.41708468 0.30381296 0.37442392 0.58625681 0.75377836 0.60004605] mean value: 0.5123382350291168 key: train_mcc value: [0.66985139 0.68357391 0.67042724 0.66985139 0.64128448 0.70395629 0.69975956 0.70918624 0.65449684 0.67530917] mean value: 0.6777696518924042 key: test_accuracy value: [0.82894737 0.80263158 0.82894737 0.82894737 0.77631579 0.73684211 0.76315789 0.84210526 0.90666667 0.85333333] mean value: 0.8167894736842105 key: train_accuracy value: [0.87390029 0.8797654 0.87536657 0.87390029 0.86656891 0.88709677 0.8856305 0.88709677 0.8682284 0.87701318] mean value: 0.8774567094455632 key: test_fscore value: [0.66666667 0.57142857 0.62857143 0.66666667 0.56410256 0.47368421 0.52631579 0.68421053 0.8 0.66666667] mean value: 0.6248313090418354 key: train_fscore value: [0.75144509 0.75882353 0.74626866 0.75144509 0.70926518 0.77681159 0.77325581 0.78551532 0.73988439 0.75147929] mean value: 0.7544193946752498 key: test_precision value: [0.68421053 0.66666667 0.73333333 0.68421053 0.61111111 0.52941176 0.58823529 0.76470588 0.93333333 0.84615385] mean value: 0.704137228440634 key: train_precision value: [0.80246914 0.82692308 0.82781457 0.80246914 0.85384615 0.82716049 0.82608696 0.80113636 0.79012346 0.82467532] mean value: 0.8182704667361305 key: test_recall value: [0.65 0.5 0.55 0.65 0.52380952 0.42857143 0.47619048 0.61904762 0.7 0.55 ] mean value: 0.5647619047619048 key: train_recall value: [0.70652174 0.70108696 0.67934783 0.70652174 0.60655738 0.73224044 0.72677596 0.7704918 0.69565217 0.69021739] mean value: 0.7015413399857449 key: test_roc_auc value: [0.77142857 0.70535714 0.73928571 0.77142857 0.6982684 0.64155844 0.67445887 0.77316017 0.84090909 0.75681818] mean value: 0.737267316017316 key: train_roc_auc value: [0.82113236 0.82343504 0.8135695 0.82113236 0.78424061 0.83806411 0.83533187 0.85017576 0.81375795 0.81805459] mean value: 0.821889413503991 key: test_jcc value: [0.5 0.4 0.45833333 0.5 0.39285714 0.31034483 0.35714286 0.52 0.66666667 0.5 ] mean value: 0.4605344827586207 key: train_jcc value: [0.60185185 0.61137441 0.5952381 0.60185185 0.54950495 0.63507109 0.63033175 0.64678899 0.58715596 0.60189573] mean value: 0.606106468934728 MCC on Blind test: 0.51 Accuracy on Blind test: 0.81 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.66347837 2.15114832 2.75264692 2.73905396 2.1834085 2.67941236 2.28768206 1.77603769 2.09980702 2.73488927] mean value: 2.4067564487457274 key: score_time value: [0.01294589 0.01303411 0.01551843 0.01541853 0.01326132 0.01550579 0.01292658 0.012887 0.01287413 0.01300073] mean value: 0.013737249374389648 key: test_mcc value: [0.44019762 0.55205245 0.68309183 0.68681493 0.64368314 0.4026607 0.45868247 0.48964721 0.52764485 0.64277498] mean value: 0.5527250178953496 key: train_mcc value: [0.97389659 0.90735013 0.96263725 0.97767156 0.93987894 0.97002758 0.95873491 0.89047817 0.90983387 0.97398605] mean value: 0.9464495043164388 key: test_accuracy value: [0.78947368 0.82894737 0.88157895 0.86842105 0.85526316 0.77631579 0.78947368 0.78947368 0.82666667 0.86666667] mean value: 0.8272280701754386 key: train_accuracy value: [0.98973607 0.96187683 0.98533724 0.99120235 0.97653959 0.98826979 0.98387097 0.95601173 0.96486091 0.9897511 ] mean value: 0.9787456580636574 key: test_fscore value: [0.57894737 0.66666667 0.75675676 0.77272727 0.74418605 0.54054054 0.6 0.63636364 0.62857143 0.72222222] mean value: 0.6646981938781205 key: train_fscore value: [0.98071625 0.93229167 0.97252747 0.98369565 0.95505618 0.97790055 0.96952909 0.92021277 0.93220339 0.98060942] mean value: 0.9604742437016127 key: test_precision value: [0.61111111 0.68421053 0.82352941 0.70833333 0.72727273 0.625 0.63157895 0.60869565 0.73333333 0.8125 ] mean value: 0.6965565042673334 key: train_precision value: [0.99441341 0.895 0.98333333 0.98369565 0.98265896 0.98882682 0.98314607 0.89637306 0.97058824 1. ] mean value: 0.9678035528213172 key: test_recall value: [0.55 0.65 0.7 0.85 0.76190476 0.47619048 0.57142857 0.66666667 0.55 0.65 ] mean value: 0.6426190476190476 key: train_recall value: [0.9673913 0.97282609 0.96195652 0.98369565 0.92896175 0.96721311 0.95628415 0.94535519 0.89673913 0.96195652] mean value: 0.9542379425041577 key: test_roc_auc value: [0.7125 0.77142857 0.82321429 0.8625 0.82640693 0.68354978 0.72207792 0.75151515 0.73863636 0.79772727] mean value: 0.7689556277056278 key: train_roc_auc value: [0.98269164 0.96532871 0.97796621 0.98883578 0.96147486 0.98160255 0.97513606 0.95263752 0.94335955 0.98097826] mean value: 0.9710011130457062 key: test_jcc value: [0.40740741 0.5 0.60869565 0.62962963 0.59259259 0.37037037 0.42857143 0.46666667 0.45833333 0.56521739] mean value: 0.5027484472049689 key: train_jcc value: [0.96216216 0.87317073 0.94652406 0.96791444 0.91397849 0.95675676 0.94086022 0.85221675 0.87301587 0.96195652] mean value: 0.9248556006500929 MCC on Blind test: 0.58 Accuracy on Blind test: 0.83 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03639221 0.03556943 0.02711511 0.0274179 0.02753472 0.03158092 0.03024459 0.03069091 0.03130579 0.0289371 ] mean value: 0.030678868293762207 key: score_time value: [0.01144648 0.00936389 0.00901127 0.00917649 0.00972128 0.00914931 0.00938988 0.00916195 0.00979638 0.00974703] mean value: 0.00959639549255371 key: test_mcc value: [0.82650337 0.57092239 0.76668414 0.79161589 0.53939394 0.67099567 0.64368314 0.69392691 0.75376307 0.6983799 ] mean value: 0.69558684338617 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93421053 0.84210526 0.90789474 0.92105263 0.81578947 0.86842105 0.85526316 0.88157895 0.90666667 0.88 ] mean value: 0.881298245614035 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86486486 0.66666667 0.82926829 0.84210526 0.66666667 0.76190476 0.74418605 0.76923077 0.81081081 0.7804878 ] mean value: 0.7736191947375038 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94117647 0.75 0.80952381 0.88888889 0.66666667 0.76190476 0.72727273 0.83333333 0.88235294 0.76190476] mean value: 0.8023024361259655 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8 0.6 0.85 0.8 0.66666667 0.76190476 0.76190476 0.71428571 0.75 0.8 ] mean value: 0.7504761904761905 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.89107143 0.76428571 0.88928571 0.88214286 0.76969697 0.83549784 0.82640693 0.82987013 0.85681818 0.85454545] mean value: 0.8399621212121212 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.76190476 0.5 0.70833333 0.72727273 0.5 0.61538462 0.59259259 0.625 0.68181818 0.64 ] mean value: 0.6352306212306212 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.7 Accuracy on Blind test: 0.88 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.15146351 0.15494537 0.14572144 0.14574671 0.14581585 0.14503813 0.14534903 0.1439271 0.14520812 0.14628482] mean value: 0.14695000648498535 key: score_time value: [0.01862741 0.01811218 0.01806188 0.01808953 0.01812577 0.01807761 0.0182085 0.01812768 0.01813745 0.01835513] mean value: 0.018192315101623537 key: test_mcc value: [0.43358045 0.61138605 0.6409855 0.4976283 0.53939394 0.61721663 0.49939976 0.51564585 0.71637516 0.60302269] mean value: 0.5674634342197126 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.80263158 0.85526316 0.86842105 0.81578947 0.81578947 0.85526316 0.80263158 0.81578947 0.89333333 0.85333333] mean value: 0.8378245614035088 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.51612903 0.7027027 0.70588235 0.61111111 0.66666667 0.68571429 0.63414634 0.63157895 0.77777778 0.68571429] mean value: 0.6617423503717906 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.72727273 0.76470588 0.85714286 0.6875 0.66666667 0.85714286 0.65 0.70588235 0.875 0.8 ] mean value: 0.7591313343519226 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.4 0.65 0.6 0.55 0.66666667 0.57142857 0.61904762 0.57142857 0.7 0.6 ] mean value: 0.5928571428571429 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.67321429 0.78928571 0.78214286 0.73035714 0.76969697 0.76753247 0.74588745 0.74025974 0.83181818 0.77272727] mean value: 0.7602922077922077 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.34782609 0.54166667 0.54545455 0.44 0.5 0.52173913 0.46428571 0.46153846 0.63636364 0.52173913] mean value: 0.4980613372135111 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.5 Accuracy on Blind test: 0.81 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0110414 0.01136756 0.01222539 0.0108819 0.01099682 0.01106071 0.01090479 0.0108161 0.01123643 0.01108217] mean value: 0.011161327362060547 key: score_time value: [0.00881791 0.00877786 0.00997138 0.00875568 0.00880456 0.00875354 0.00876141 0.00882554 0.0088098 0.0087533 ] mean value: 0.008903098106384278 key: test_mcc value: [0.27602622 0.45187994 0.49939976 0.45714286 0.54677939 0.18613561 0.24759308 0.36154674 0.22613351 0.18181818] mean value: 0.34344552857203775 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.72368421 0.77631579 0.80263158 0.78947368 0.80263158 0.69736842 0.73684211 0.73684211 0.72 0.68 ] mean value: 0.746578947368421 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.46153846 0.60465116 0.63414634 0.6 0.68085106 0.37837838 0.375 0.54545455 0.4 0.4 ] mean value: 0.5080019953455285 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.47368421 0.56521739 0.61904762 0.6 0.61538462 0.4375 0.54545455 0.52173913 0.46666667 0.4 ] mean value: 0.5244694178818893 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.45 0.65 0.65 0.6 0.76190476 0.33333333 0.28571429 0.57142857 0.35 0.4 ] mean value: 0.5052380952380953 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63571429 0.73571429 0.75357143 0.72857143 0.79004329 0.58484848 0.5974026 0.68571429 0.60227273 0.59090909] mean value: 0.6704761904761904 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.3 0.43333333 0.46428571 0.42857143 0.51612903 0.23333333 0.23076923 0.375 0.25 0.25 ] mean value: 0.34814220725511047 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.38 Accuracy on Blind test: 0.75 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.25220633 2.30483413 2.25472188 2.22439718 2.26833797 2.24337077 2.26458836 2.2349503 2.25734282 2.28761482] mean value: 2.259236454963684 key: score_time value: [0.1039598 0.09403181 0.09515738 0.09428215 0.09438896 0.09375715 0.0979569 0.0937984 0.10010552 0.09398079] mean value: 0.0961418867111206 key: test_mcc value: [0.6409855 0.65104858 0.71751058 0.75907212 0.56622086 0.66254135 0.69986305 0.76353586 0.82577865 0.72009768] mean value: 0.7006654225859422 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86842105 0.86842105 0.89473684 0.90789474 0.82894737 0.86842105 0.88157895 0.90789474 0.93333333 0.89333333] mean value: 0.8852982456140351 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.70588235 0.73684211 0.77777778 0.82051282 0.68292683 0.75 0.7804878 0.81081081 0.86486486 0.78947368] mean value: 0.7719579050527475 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85714286 0.77777778 0.875 0.84210526 0.7 0.78947368 0.8 0.9375 0.94117647 0.83333333] mean value: 0.8353509386210625 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6 0.7 0.7 0.8 0.66666667 0.71428571 0.76190476 0.71428571 0.8 0.75 ] mean value: 0.7207142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78214286 0.81428571 0.83214286 0.87321429 0.77878788 0.82077922 0.84458874 0.84805195 0.89090909 0.84772727] mean value: 0.8332629870129871 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.54545455 0.58333333 0.63636364 0.69565217 0.51851852 0.6 0.64 0.68181818 0.76190476 0.65217391] mean value: 0.6315219064349499 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.67 Accuracy on Blind test: 0.87 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.94173408 1.01432633 1.08325291 1.04413033 1.01967764 1.01618648 1.04146981 1.02105212 1.05572248 1.04665756] mean value: 1.1284209728240966 key: score_time value: [0.27791572 0.25200295 0.27757406 0.13488817 0.25813341 0.24319124 0.25587392 0.16115642 0.27250767 0.15622592] mean value: 0.22894694805145263 key: test_mcc value: [0.6409855 0.68309183 0.71751058 0.68309183 0.52663543 0.69392691 0.70856367 0.72858509 0.71706665 0.64277498] mean value: 0.6742232454814421 key: train_mcc value: [0.92493762 0.93257835 0.92914668 0.94018532 0.91346549 0.93233511 0.94372729 0.93610722 0.93273198 0.9138245 ] mean value: 0.9299039551930306 key: test_accuracy value: [0.86842105 0.88157895 0.89473684 0.88157895 0.81578947 0.88157895 0.88157895 0.89473684 0.89333333 0.86666667] mean value: 0.876 key: train_accuracy value: [0.97067449 0.97360704 0.97214076 0.97653959 0.96627566 0.97360704 0.97800587 0.97507331 0.97364568 0.96632504] mean value: 0.9725894471088823 key: test_fscore value: [0.70588235 0.75675676 0.77777778 0.75675676 0.65 0.76923077 0.79069767 0.77777778 0.76470588 0.72222222] mean value: 0.7471807970234783 key: train_fscore value: [0.94413408 0.9494382 0.94586895 0.95505618 0.93409742 0.94915254 0.95774648 0.95211268 0.94915254 0.93447293] mean value: 0.9471232001455421 key: test_precision value: [0.85714286 0.82352941 0.875 0.82352941 0.68421053 0.83333333 0.77272727 0.93333333 0.92857143 0.8125 ] mean value: 0.8343877574953427 key: train_precision value: [0.97126437 0.98255814 0.99401198 0.98837209 0.98192771 0.98245614 0.98837209 0.98255814 0.98823529 0.98203593] mean value: 0.9841791882435885 key: test_recall value: [0.6 0.7 0.7 0.7 0.61904762 0.71428571 0.80952381 0.66666667 0.65 0.65 ] mean value: 0.680952380952381 key: train_recall value: [0.91847826 0.91847826 0.90217391 0.92391304 0.89071038 0.91803279 0.92896175 0.92349727 0.91304348 0.89130435] mean value: 0.9128593490140176 key: test_roc_auc value: [0.78214286 0.82321429 0.83214286 0.82321429 0.75497835 0.82987013 0.85930736 0.82424242 0.81590909 0.79772727] mean value: 0.8142748917748918 key: train_roc_auc value: [0.95421905 0.95622708 0.95008294 0.95994849 0.94234918 0.95601038 0.96247687 0.95874262 0.95451773 0.94264616] mean value: 0.9537220504235004 key: test_jcc value: [0.54545455 0.60869565 0.63636364 0.60869565 0.48148148 0.625 0.65384615 0.63636364 0.61904762 0.56521739] mean value: 0.5980165768209247 key: train_jcc value: [0.89417989 0.90374332 0.8972973 0.91397849 0.87634409 0.90322581 0.91891892 0.90860215 0.90322581 0.87700535] mean value: 0.8996521117583736 MCC on Blind test: 0.67 Accuracy on Blind test: 0.87 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0255971 0.01135564 0.01114917 0.01224518 0.01257515 0.01233673 0.01233101 0.01187515 0.01254964 0.01140475] mean value: 0.013341951370239257 key: score_time value: [0.00925922 0.00919247 0.00999546 0.00957441 0.00994539 0.0099535 0.00913477 0.00995517 0.0098753 0.00904942] mean value: 0.009593510627746582 key: test_mcc value: [0.42968224 0.45714286 0.53968028 0.54096275 0.43257867 0.38416102 0.52663543 0.37442392 0.52764485 0.4955746 ] mean value: 0.4708486628265047 key: train_mcc value: [0.54291625 0.53559377 0.53542123 0.56534397 0.53524411 0.55345132 0.5311225 0.54923819 0.54615584 0.52355185] mean value: 0.5418039021364769 key: test_accuracy value: [0.76315789 0.78947368 0.82894737 0.81578947 0.77631579 0.75 0.81578947 0.76315789 0.82666667 0.81333333] mean value: 0.7942631578947369 key: train_accuracy value: [0.82111437 0.82111437 0.81818182 0.82844575 0.81964809 0.82404692 0.81524927 0.82111437 0.82284041 0.81551977] mean value: 0.8207275131707192 key: test_fscore value: [0.59090909 0.6 0.64864865 0.66666667 0.58536585 0.55813953 0.65 0.52631579 0.62857143 0.61111111] mean value: 0.6065728123922888 key: train_fscore value: [0.66483516 0.65536723 0.65934066 0.68292683 0.65738162 0.67391304 0.6576087 0.67204301 0.66666667 0.64804469] mean value: 0.663812760996864 key: test_precision value: [0.54166667 0.6 0.70588235 0.63636364 0.6 0.54545455 0.68421053 0.58823529 0.73333333 0.6875 ] mean value: 0.6322646355192795 key: train_precision value: [0.67222222 0.68235294 0.66666667 0.68108108 0.67045455 0.67027027 0.65405405 0.66137566 0.67597765 0.66666667] mean value: 0.6701121762598923 key: test_recall value: [0.65 0.6 0.6 0.7 0.57142857 0.57142857 0.61904762 0.47619048 0.55 0.55 ] mean value: 0.5888095238095238 key: train_recall value: [0.6576087 0.63043478 0.65217391 0.68478261 0.64480874 0.67759563 0.66120219 0.68306011 0.6576087 0.63043478] mean value: 0.6579710144927536 key: test_roc_auc value: [0.72678571 0.72857143 0.75535714 0.77857143 0.71298701 0.69480519 0.75497835 0.67445887 0.73863636 0.72954545] mean value: 0.729469696969697 key: train_roc_auc value: [0.7695674 0.76100052 0.76584599 0.78315436 0.76428814 0.77767557 0.76647284 0.7774018 0.77068812 0.75710116] mean value: 0.7693195890646318 key: test_jcc value: [0.41935484 0.42857143 0.48 0.5 0.4137931 0.38709677 0.48148148 0.35714286 0.45833333 0.44 ] mean value: 0.4365773816880602 key: train_jcc value: [0.49794239 0.48739496 0.49180328 0.51851852 0.48962656 0.50819672 0.48987854 0.50607287 0.5 0.47933884] mean value: 0.496877267932884 MCC on Blind test: 0.47 Accuracy on Blind test: 0.8 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.13928175 0.11773467 0.10438395 0.10354257 0.10556793 0.10095 0.10506415 0.10373998 0.11169815 0.11128283] mean value: 0.11032459735870362 key: score_time value: [0.01147556 0.01137829 0.0112083 0.01126432 0.01129389 0.01115394 0.01117277 0.01144838 0.01209068 0.01141405] mean value: 0.011390018463134765 key: test_mcc value: [0.66071429 0.77709656 0.82807867 0.83350524 0.60519481 0.75730256 0.7734442 0.73049431 0.82728639 0.89983564] mean value: 0.769295265668651 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86842105 0.90789474 0.93421053 0.93421053 0.84210526 0.89473684 0.90789474 0.89473684 0.93333333 0.96 ] mean value: 0.9077543859649123 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.8372093 0.87179487 0.87804878 0.71428571 0.82608696 0.8372093 0.8 0.87179487 0.92682927] mean value: 0.8313259067828848 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.7826087 0.89473684 0.85714286 0.71428571 0.76 0.81818182 0.84210526 0.89473684 0.9047619 ] mean value: 0.821855993739289 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.9 0.85 0.9 0.71428571 0.9047619 0.85714286 0.76190476 0.85 0.95 ] mean value: 0.8438095238095238 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83035714 0.90535714 0.90714286 0.92321429 0.8025974 0.8978355 0.89220779 0.85367965 0.90681818 0.95681818] mean value: 0.8876028138528138 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.72 0.77272727 0.7826087 0.55555556 0.7037037 0.72 0.66666667 0.77272727 0.86363636] mean value: 0.7157625530669008 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.79 Accuracy on Blind test: 0.91 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05319357 0.09460664 0.08337164 0.08630013 0.06278348 0.07761669 0.06303596 0.06139302 0.0718267 0.0702157 ] mean value: 0.07243435382843018 key: score_time value: [0.01877332 0.01906776 0.02187419 0.02077937 0.01742435 0.01251388 0.02134514 0.01239467 0.01223898 0.02151847] mean value: 0.01779301166534424 key: test_mcc value: [0.49396542 0.41403934 0.58076493 0.64700991 0.55369745 0.29893648 0.45868247 0.52663543 0.50830425 0.68174749] mean value: 0.5163783158687285 key: train_mcc value: [0.77596943 0.75954937 0.77050634 0.76515426 0.78951424 0.77260991 0.76769444 0.76847684 0.74055285 0.77000441] mean value: 0.7680032094125958 key: test_accuracy value: [0.78947368 0.77631579 0.84210526 0.85526316 0.81578947 0.72368421 0.78947368 0.81578947 0.81333333 0.88 ] mean value: 0.8101228070175439 key: train_accuracy value: [0.91202346 0.90615836 0.91055718 0.90762463 0.91788856 0.91055718 0.90909091 0.90909091 0.89751098 0.91068814] mean value: 0.9091190323868735 key: test_fscore value: [0.63636364 0.56410256 0.68421053 0.74418605 0.68181818 0.48780488 0.6 0.65 0.63157895 0.75675676] mean value: 0.6436821537285758 key: train_fscore value: [0.83606557 0.82320442 0.83102493 0.82833787 0.84530387 0.83378747 0.82967033 0.83060109 0.81081081 0.83008357] mean value: 0.8298889931247613 key: test_precision value: [0.58333333 0.57894737 0.72222222 0.69565217 0.65217391 0.5 0.63157895 0.68421053 0.66666667 0.82352941] mean value: 0.6538314563048713 key: train_precision value: [0.84065934 0.83707865 0.84745763 0.83060109 0.8547486 0.83152174 0.83425414 0.83060109 0.80645161 0.85142857] mean value: 0.8364802475716324 key: test_recall value: [0.7 0.55 0.65 0.8 0.71428571 0.47619048 0.57142857 0.61904762 0.6 0.7 ] mean value: 0.638095238095238 key: train_recall value: [0.83152174 0.80978261 0.81521739 0.82608696 0.83606557 0.83606557 0.82513661 0.83060109 0.81521739 0.80978261] mean value: 0.8235477548111191 key: test_roc_auc value: [0.76071429 0.70357143 0.78035714 0.8375 0.78441558 0.64718615 0.72207792 0.75497835 0.74545455 0.82272727] mean value: 0.7558982683982683 key: train_roc_auc value: [0.8866444 0.87577484 0.88050026 0.88191898 0.89198068 0.88697066 0.88250819 0.88423842 0.87153655 0.8788392 ] mean value: 0.8820912189158894 key: test_jcc value: [0.46666667 0.39285714 0.52 0.59259259 0.51724138 0.32258065 0.42857143 0.48148148 0.46153846 0.60869565] mean value: 0.4792225450353322 key: train_jcc value: [0.71830986 0.69953052 0.71090047 0.70697674 0.73205742 0.71495327 0.70892019 0.71028037 0.68181818 0.70952381] mean value: 0.7093270833969725 MCC on Blind test: 0.58 Accuracy on Blind test: 0.84 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01471019 0.01353598 0.01168084 0.01050878 0.0105772 0.01063776 0.01066518 0.01089907 0.01060247 0.01091695] mean value: 0.011473441123962402 key: score_time value: [0.0122757 0.0103693 0.00927901 0.00910711 0.0088954 0.0088973 0.00879335 0.00883627 0.00886154 0.00904274] mean value: 0.009435772895812988 key: test_mcc value: [0.43257867 0.56622086 0.621059 0.45187994 0.59458839 0.45868247 0.49939976 0.52663543 0.6048462 0.68174749] mean value: 0.5437638210554387 key: train_mcc value: [0.57575956 0.57012347 0.55191097 0.55790628 0.56084927 0.56982156 0.56533933 0.56234007 0.55668873 0.52682877] mean value: 0.5597568021362611 key: test_accuracy value: [0.77631579 0.82894737 0.85526316 0.77631579 0.84210526 0.78947368 0.80263158 0.81578947 0.84 0.88 ] mean value: 0.8206842105263158 key: train_accuracy value: [0.83284457 0.83284457 0.82404692 0.82697947 0.82844575 0.8313783 0.82991202 0.82844575 0.8272328 0.81405564] mean value: 0.8276185794085951 key: test_fscore value: [0.58536585 0.68292683 0.71794872 0.60465116 0.7 0.6 0.63414634 0.65 0.71428571 0.75675676] mean value: 0.664608137617213 key: train_fscore value: [0.69021739 0.68333333 0.67213115 0.67582418 0.67768595 0.68493151 0.68131868 0.67945205 0.67403315 0.65395095] mean value: 0.6772878344228326 key: test_precision value: [0.57142857 0.66666667 0.73684211 0.56521739 0.73684211 0.63157895 0.65 0.68421053 0.68181818 0.82352941] mean value: 0.6748133907192999 key: train_precision value: [0.69021739 0.69886364 0.67582418 0.68333333 0.68333333 0.68681319 0.68508287 0.68131868 0.68539326 0.6557377 ] mean value: 0.6825917574563871 key: test_recall value: [0.6 0.7 0.7 0.65 0.66666667 0.57142857 0.61904762 0.61904762 0.75 0.7 ] mean value: 0.6576190476190475 key: train_recall value: [0.69021739 0.66847826 0.66847826 0.66847826 0.67213115 0.68306011 0.67759563 0.67759563 0.66304348 0.65217391] mean value: 0.6721252078878593 key: test_roc_auc value: [0.71964286 0.7875 0.80535714 0.73571429 0.78787879 0.72207792 0.74588745 0.75497835 0.81136364 0.82272727] mean value: 0.7693127705627706 key: train_roc_auc value: [0.78787978 0.78102628 0.77500218 0.77701021 0.77895135 0.78441583 0.78168359 0.78068158 0.77540951 0.7629607 ] mean value: 0.7785021014127629 key: test_jcc value: [0.4137931 0.51851852 0.56 0.43333333 0.53846154 0.42857143 0.46428571 0.48148148 0.55555556 0.60869565] mean value: 0.500269632582976 key: train_jcc value: [0.52697095 0.51898734 0.50617284 0.51037344 0.5125 0.52083333 0.51666667 0.51452282 0.50833333 0.48582996] mean value: 0.5121190694042841 MCC on Blind test: 0.48 Accuracy on Blind test: 0.8 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02312326 0.01958394 0.0229795 0.02434564 0.0250411 0.02593589 0.02374864 0.02295756 0.02466846 0.02031732] mean value: 0.023270130157470703 key: score_time value: [0.01271677 0.01243877 0.01255846 0.01221299 0.01256347 0.01247096 0.01330996 0.01226854 0.01219082 0.01226783] mean value: 0.012499856948852538 key: test_mcc value: [0.18576195 0.47809144 0.47657854 0.57009641 0.62473393 0.44503488 0. 0.49168478 0.55925894 0.64277498] mean value: 0.44740158516400225 key: train_mcc value: [0.22933185 0.55584409 0.46931111 0.6515961 0.68742226 0.74986385 0.21142669 0.6664289 0.65980274 0.66844332] mean value: 0.554947091605564 key: test_accuracy value: [0.75 0.71052632 0.81578947 0.76315789 0.81578947 0.78947368 0.72368421 0.73684211 0.84 0.86666667] mean value: 0.7811929824561403 key: train_accuracy value: [0.74926686 0.74193548 0.80791789 0.81818182 0.85043988 0.90029326 0.74780059 0.8255132 0.87262079 0.87115666] mean value: 0.8185126426022851 key: test_fscore value: [0.17391304 0.62068966 0.5 0.67857143 0.73076923 0.57894737 0. 0.64285714 0.625 0.72222222] mean value: 0.5272970091491752 key: train_fscore value: [0.1319797 0.66917293 0.46530612 0.74058577 0.77027027 0.81818182 0.11340206 0.74947368 0.72555205 0.75555556] mean value: 0.5939479964816883 key: test_precision value: [0.66666667 0.47368421 0.875 0.52777778 0.61290323 0.64705882 0. 0.51428571 0.83333333 0.8125 ] mean value: 0.5963209751925671 key: train_precision value: [1. 0.51149425 0.93442623 0.60204082 0.65517241 0.80104712 1. 0.60958904 0.86466165 0.77272727] mean value: 0.7751158800878744 key: test_recall value: [0.1 0.9 0.35 0.95 0.9047619 0.52380952 0. 0.85714286 0.5 0.65 ] mean value: 0.5735714285714286 key: train_recall value: [0.07065217 0.9673913 0.30978261 0.96195652 0.93442623 0.83606557 0.06010929 0.9726776 0.625 0.73913043] mean value: 0.6477191732002852 key: test_roc_auc value: [0.54107143 0.77142857 0.66607143 0.82321429 0.84329004 0.70735931 0.5 0.77402597 0.73181818 0.79772727] mean value: 0.7156006493506493 key: train_roc_auc value: [0.53532609 0.81301292 0.65087524 0.86350838 0.87703275 0.87995663 0.53005464 0.87211034 0.79446393 0.82948506] mean value: 0.7645825988897821 key: test_jcc value: [0.0952381 0.45 0.33333333 0.51351351 0.57575758 0.40740741 0. 0.47368421 0.45454545 0.56521739] mean value: 0.3868696981626043 key: train_jcc value: [0.07065217 0.50282486 0.30319149 0.58803987 0.62637363 0.69230769 0.06010929 0.5993266 0.56930693 0.60714286] mean value: 0.4619275384602773 MCC on Blind test: 0.58 Accuracy on Blind test: 0.84 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02798748 0.0284133 0.03016829 0.03018427 0.03876376 0.03744674 0.05252314 0.02689528 0.03849101 0.02904224] mean value: 0.033991551399230956 key: score_time value: [0.01247096 0.01632237 0.0129602 0.01235032 0.01333857 0.0248065 0.01258373 0.01246619 0.01255035 0.0125165 ] mean value: 0.01423656940460205 key: test_mcc value: [0.55205245 0.59285714 0.56294295 0.69006556 0.31077631 0.31077631 0.52663543 0.54701077 0.48900965 0.62764591] mean value: 0.520977247933603 key: train_mcc value: [0.81526695 0.76569585 0.59161944 0.70559502 0.47781982 0.50535412 0.793226 0.70835718 0.58207667 0.63419533] mean value: 0.6579206367061766 key: test_accuracy value: [0.82894737 0.84210526 0.84210526 0.88157895 0.76315789 0.76315789 0.81578947 0.82894737 0.81333333 0.78666667] mean value: 0.8165789473684211 key: train_accuracy value: [0.92815249 0.90322581 0.84750733 0.88856305 0.81085044 0.81964809 0.91788856 0.89002933 0.84480234 0.80380673] mean value: 0.8654474180238125 key: test_fscore value: [0.66666667 0.7 0.6 0.76923077 0.30769231 0.30769231 0.65 0.64864865 0.46153846 0.71428571] mean value: 0.5825754875754876 key: train_fscore value: [0.86350975 0.83076923 0.62318841 0.76969697 0.4691358 0.50996016 0.84946237 0.7706422 0.61594203 0.72653061] mean value: 0.7028837526055274 key: test_precision value: [0.68421053 0.7 0.9 0.78947368 0.8 0.8 0.68421053 0.75 1. 0.55555556] mean value: 0.7663450292397661 key: train_precision value: [0.88571429 0.78640777 0.93478261 0.86986301 0.95 0.94117647 0.83597884 0.875 0.92391304 0.58169935] mean value: 0.858453537154942 key: test_recall value: [0.65 0.7 0.45 0.75 0.19047619 0.19047619 0.61904762 0.57142857 0.3 1. ] mean value: 0.5421428571428571 key: train_recall value: [0.8423913 0.88043478 0.4673913 0.69021739 0.31147541 0.34972678 0.86338798 0.68852459 0.46195652 0.9673913 ] mean value: 0.6522897362794012 key: test_roc_auc value: [0.77142857 0.79642857 0.71607143 0.83928571 0.58614719 0.58614719 0.75497835 0.74935065 0.65 0.85454545] mean value: 0.7304383116883116 key: train_roc_auc value: [0.90111533 0.89604068 0.72767156 0.82603239 0.65273169 0.67085537 0.90063186 0.82622622 0.72396423 0.85543914] mean value: 0.7980708486147069 key: test_jcc value: [0.5 0.53846154 0.42857143 0.625 0.18181818 0.18181818 0.48148148 0.48 0.3 0.55555556] mean value: 0.42727063677063676 key: train_jcc value: [0.75980392 0.71052632 0.45263158 0.62561576 0.30645161 0.34224599 0.73831776 0.62686567 0.44502618 0.57051282] mean value: 0.5577997609234735 MCC on Blind test: 0.61 Accuracy on Blind test: 0.85 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.24267983 0.22689581 0.22911096 0.23088861 0.2304976 0.23003793 0.22936296 0.22639275 0.22763729 0.22939324] mean value: 0.23028969764709473 key: score_time value: [0.01547527 0.01565886 0.01581311 0.01586604 0.0156765 0.01581097 0.01544333 0.01569009 0.01573777 0.01564431] mean value: 0.01568162441253662 key: test_mcc value: [0.59285714 0.73862221 0.75907212 0.76668414 0.64368314 0.64368314 0.67099567 0.73049431 0.59090909 0.68863504] mean value: 0.68256359998009 key: train_mcc value: [0.93301467 0.91490177 0.94027897 0.93043955 0.92981722 0.94746341 0.92593033 0.9327836 0.92589187 0.93330543] mean value: 0.9313826823116439 key: test_accuracy value: [0.84210526 0.89473684 0.90789474 0.90789474 0.85526316 0.85526316 0.86842105 0.89473684 0.84 0.88 ] mean value: 0.8746315789473684 key: train_accuracy value: [0.97360704 0.96627566 0.97653959 0.97214076 0.97214076 0.97947214 0.97067449 0.97360704 0.97071742 0.97364568] mean value: 0.9728820581959013 key: test_fscore value: [0.7 0.80952381 0.82051282 0.82926829 0.74418605 0.74418605 0.76190476 0.8 0.7 0.76923077] mean value: 0.7678812546878344 key: train_fscore value: [0.95108696 0.93800539 0.95628415 0.94933333 0.94878706 0.96132597 0.94594595 0.95081967 0.94594595 0.95135135] mean value: 0.9498885777915945 key: test_precision value: [0.7 0.77272727 0.84210526 0.80952381 0.72727273 0.72727273 0.76190476 0.84210526 0.7 0.78947368] mean value: 0.7672385509227614 key: train_precision value: [0.95108696 0.93048128 0.96153846 0.93193717 0.93617021 0.97206704 0.93582888 0.95081967 0.94086022 0.94623656] mean value: 0.9457026449459676 key: test_recall value: [0.7 0.85 0.8 0.85 0.76190476 0.76190476 0.76190476 0.76190476 0.7 0.75 ] mean value: 0.7697619047619048 key: train_recall value: [0.95108696 0.94565217 0.95108696 0.9673913 0.96174863 0.95081967 0.95628415 0.95081967 0.95108696 0.95652174] mean value: 0.9542498218104063 key: test_roc_auc value: [0.79642857 0.88035714 0.87321429 0.88928571 0.82640693 0.82640693 0.83549784 0.85367965 0.79545455 0.83863636] mean value: 0.8415367965367966 key: train_roc_auc value: [0.96650733 0.95977388 0.96851537 0.97064344 0.96885027 0.97039982 0.96611803 0.9663918 0.96452143 0.96824083] mean value: 0.9669962197880291 key: test_jcc value: [0.53846154 0.68 0.69565217 0.70833333 0.59259259 0.59259259 0.61538462 0.66666667 0.53846154 0.625 ] mean value: 0.6253145051405921 key: train_jcc value: [0.90673575 0.88324873 0.91623037 0.9035533 0.9025641 0.92553191 0.8974359 0.90625 0.8974359 0.90721649] mean value: 0.9046202455419211 MCC on Blind test: 0.77 Accuracy on Blind test: 0.91 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.09009361 0.10407043 0.09775949 0.11060524 0.10526705 0.10704541 0.05889678 0.09313583 0.11835146 0.11643147] mean value: 0.10016567707061767 key: score_time value: [0.0344286 0.02585244 0.0223434 0.03886867 0.02385998 0.02790022 0.02931643 0.03040266 0.03983021 0.02888322] mean value: 0.030168581008911132 key: test_mcc value: [0.72857143 0.8045087 0.79161589 0.79642857 0.66254135 0.70856367 0.67099567 0.73049431 0.79545455 0.79069549] mean value: 0.747986962461053 key: train_mcc value: [0.99255719 0.99260991 0.99255719 0.98140504 0.96257518 0.98133991 0.98133991 0.97754897 0.98509865 0.97770013] mean value: 0.982473206049973 key: test_accuracy value: [0.89473684 0.92105263 0.92105263 0.92105263 0.86842105 0.88157895 0.86842105 0.89473684 0.92 0.92 ] mean value: 0.9011052631578947 key: train_accuracy value: [0.99706745 0.99706745 0.99706745 0.99266862 0.98533724 0.99266862 0.99266862 0.99120235 0.99414348 0.99121523] mean value: 0.9931106512153128 key: test_fscore value: [0.8 0.85714286 0.84210526 0.85 0.75 0.79069767 0.76190476 0.8 0.85 0.84210526] mean value: 0.8143955819782014 key: train_fscore value: [0.99456522 0.99459459 0.99456522 0.9862259 0.97206704 0.98614958 0.98614958 0.98342541 0.98907104 0.98342541] mean value: 0.987023899975587 key: test_precision value: [0.8 0.81818182 0.88888889 0.85 0.78947368 0.77272727 0.76190476 0.84210526 0.85 0.88888889] mean value: 0.8262170577960052 key: train_precision value: [0.99456522 0.98924731 0.99456522 1. 0.99428571 1. 1. 0.99441341 0.99450549 1. ] mean value: 0.9961582363223004 key: test_recall value: [0.8 0.9 0.8 0.85 0.71428571 0.80952381 0.76190476 0.76190476 0.85 0.8 ] mean value: 0.8047619047619048 key: train_recall value: [0.99456522 1. 0.99456522 0.97282609 0.95081967 0.9726776 0.9726776 0.9726776 0.98369565 0.9673913 ] mean value: 0.9781895937277263 key: test_roc_auc value: [0.86428571 0.91428571 0.88214286 0.89821429 0.82077922 0.85930736 0.83549784 0.85367965 0.89772727 0.88181818] mean value: 0.8707738095238096 key: train_roc_auc value: [0.99627859 0.99799197 0.99627859 0.98641304 0.97440783 0.9863388 0.9863388 0.98533679 0.99084582 0.98369565] mean value: 0.9883925892357555 key: test_jcc value: [0.66666667 0.75 0.72727273 0.73913043 0.6 0.65384615 0.61538462 0.66666667 0.73913043 0.72727273] mean value: 0.6885370426674774 key: train_jcc value: [0.98918919 0.98924731 0.98918919 0.97282609 0.94565217 0.9726776 0.9726776 0.9673913 0.97837838 0.9673913 ] mean value: 0.9744620129406761 MCC on Blind test: 0.79 Accuracy on Blind test: 0.91 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.21727848 0.22791338 0.24239659 0.35803866 0.32010889 0.32323623 0.31458378 0.28612709 0.32833815 0.32110643] mean value: 0.29391276836395264 key: score_time value: [0.03302312 0.03829074 0.03578234 0.03027177 0.03229713 0.03090048 0.01834702 0.01790905 0.03131437 0.03010559] mean value: 0.029824161529541017 key: test_mcc value: [0.3790877 0.44270548 0.38615095 0.33584695 0.37798833 0.36333066 0.31195458 0.36333066 0.47535069 0.37787109] mean value: 0.3813617084212999 key: train_mcc value: [0.9256841 0.92196808 0.91453396 0.91453396 0.91048427 0.91794953 0.91390497 0.92914132 0.91829032 0.92943378] mean value: 0.9195924280035491 key: test_accuracy value: [0.78947368 0.80263158 0.78947368 0.77631579 0.77631579 0.77631579 0.76315789 0.77631579 0.81333333 0.78666667] mean value: 0.785 key: train_accuracy value: [0.97067449 0.96920821 0.96627566 0.96627566 0.96480938 0.96774194 0.96627566 0.97214076 0.96778917 0.97218155] mean value: 0.9683372476953925 key: test_fscore value: [0.38461538 0.54545455 0.46666667 0.4137931 0.48484848 0.4137931 0.35714286 0.4137931 0.5 0.38461538] mean value: 0.4364722633688151 key: train_fscore value: [0.94252874 0.93948127 0.93333333 0.93333333 0.92982456 0.93604651 0.93333333 0.94524496 0.93641618 0.94555874] mean value: 0.9375100957673573 key: test_precision value: [0.83333333 0.69230769 0.7 0.66666667 0.66666667 0.75 0.71428571 0.75 0.875 0.83333333] mean value: 0.7481593406593406 key: train_precision value: [1. 1. 1. 1. 1. 1. 0.99382716 1. 1. 1. ] mean value: 0.9993827160493827 key: test_recall value: [0.25 0.45 0.35 0.3 0.38095238 0.28571429 0.23809524 0.28571429 0.35 0.25 ] mean value: 0.314047619047619 key: train_recall value: [0.89130435 0.88586957 0.875 0.875 0.86885246 0.87978142 0.87978142 0.89617486 0.88043478 0.89673913] mean value: 0.8828937990021383 key: test_roc_auc value: [0.61607143 0.68928571 0.64821429 0.62321429 0.65411255 0.62467532 0.6008658 0.62467532 0.66590909 0.61590909] mean value: 0.63629329004329 key: train_roc_auc value: [0.94565217 0.94293478 0.9375 0.9375 0.93442623 0.93989071 0.93888871 0.94808743 0.94021739 0.94836957] mean value: 0.9413466991002676 key: test_jcc value: [0.23809524 0.375 0.30434783 0.26086957 0.32 0.26086957 0.2173913 0.26086957 0.33333333 0.23809524] mean value: 0.2808871635610766 key: train_jcc value: [0.89130435 0.88586957 0.875 0.875 0.86885246 0.87978142 0.875 0.89617486 0.88043478 0.89673913] mean value: 0.8824156569256355 MCC on Blind test: 0.35 Accuracy on Blind test: 0.78 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [1.01005697 0.99557376 1.00276566 0.99281025 1.00908518 0.99489617 1.00263834 1.00866675 0.99828649 0.99859738] mean value: 1.0013376951217652 key: score_time value: [0.01030517 0.009691 0.01017618 0.00980783 0.07066965 0.01032805 0.00980544 0.00971413 0.00966191 0.00945163] mean value: 0.015961098670959472 key: test_mcc value: [0.71205323 0.77709656 0.87039519 0.83350524 0.63304195 0.71964027 0.70856367 0.69986305 0.79545455 0.8035183 ] mean value: 0.7553132004938042 key: train_mcc value: [1. 0.99627913 1. 0.9888617 1. 1. 1. 1. 1. 1. ] mean value: 0.9985140825524362 key: test_accuracy value: [0.88157895 0.90789474 0.94736842 0.93421053 0.85526316 0.88157895 0.88157895 0.88157895 0.92 0.92 ] mean value: 0.9011052631578947 key: train_accuracy value: [1. 0.99853372 1. 0.99560117 1. 1. 1. 1. 1. 1. ] mean value: 0.9994134897360704 key: test_fscore value: [0.79069767 0.8372093 0.9047619 0.87804878 0.73170732 0.8 0.79069767 0.7804878 0.85 0.85714286] mean value: 0.8220753315506577 key: train_fscore value: [1. 0.9972752 1. 0.99186992 1. 1. 1. 1. 1. 1. ] mean value: 0.998914512305886 key: test_precision value: [0.73913043 0.7826087 0.86363636 0.85714286 0.75 0.75 0.77272727 0.8 0.85 0.81818182] mean value: 0.7983427442123094 key: train_precision value: [1. 1. 1. 0.98918919 1. 1. 1. 1. 1. 1. ] mean value: 0.9989189189189189 key: test_recall value: [0.85 0.9 0.95 0.9 0.71428571 0.85714286 0.80952381 0.76190476 0.85 0.9 ] mean value: 0.8492857142857143 key: train_recall value: [1. 0.99456522 1. 0.99456522 1. 1. 1. 1. 1. 1. ] mean value: 0.9989130434782608 key: test_roc_auc value: [0.87142857 0.90535714 0.94821429 0.92321429 0.81168831 0.87402597 0.85930736 0.84458874 0.89772727 0.91363636] mean value: 0.8849188311688312 key: train_roc_auc value: [1. 0.99728261 1. 0.99527458 1. 1. 1. 1. 1. 1. ] mean value: 0.999255718526279 key: test_jcc value: [0.65384615 0.72 0.82608696 0.7826087 0.57692308 0.66666667 0.65384615 0.64 0.73913043 0.75 ] mean value: 0.7009108138238573 key: train_jcc value: [1. 0.99456522 1. 0.98387097 1. 1. 1. 1. 1. 1. ] mean value: 0.9978436185133239 MCC on Blind test: 0.79 Accuracy on Blind test: 0.91 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03526473 0.0357461 0.03524756 0.04501867 0.03488183 0.0351634 0.04225397 0.03609633 0.03565288 0.06816936] mean value: 0.040349483489990234 key: score_time value: [0.01241183 0.01264811 0.01375818 0.01282477 0.01378298 0.01384163 0.01376128 0.02365279 0.01395798 0.0260396 ] mean value: 0.01566791534423828 key: test_mcc value: [ 0.19855331 0.08872443 0.15357143 0.20690038 0.18687064 0.21349671 0.13659979 0.2289763 -0.05397347 0.26827168] mean value: 0.16279911918256598 key: train_mcc value: [0.36535467 0.29778297 0.3516267 0.28795303 0.28968809 0.39700229 0.30777947 0.37907736 0.31662698 0.30000249] mean value: 0.3292894060024899 key: test_accuracy value: [0.48684211 0.39473684 0.44736842 0.42105263 0.40789474 0.55263158 0.43421053 0.56578947 0.34666667 0.48 ] mean value: 0.453719298245614 key: train_accuracy value: [0.53519062 0.46334311 0.52052786 0.45307918 0.45454545 0.56891496 0.47360704 0.54985337 0.48316252 0.46559297] mean value: 0.4967817074060875 key: test_fscore value: [0.46575342 0.425 0.44736842 0.46341463 0.47058824 0.48484848 0.4556962 0.49230769 0.37974684 0.49350649] mean value: 0.4578230423787979 key: train_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [0.53722628 0.5013624 0.5294964 0.49662618 0.49593496 0.55454545 0.50482759 0.54383358 0.51040222 0.50204638] mean value: 0.517630144384987 key: test_precision value: [0.32075472 0.28333333 0.30357143 0.30645161 0.3125 0.35555556 0.31034483 0.36363636 0.25423729 0.33333333] mean value: 0.31437184600361723 key: train_precision value: [0.36726547 0.33454545 0.36007828 0.33034111 0.32972973 0.3836478 0.33763838 0.37346939 0.34264432 0.33515483] mean value: 0.3494514754466544 key: test_recall value: [0.85 0.85 0.85 0.95 0.95238095 0.76190476 0.85714286 0.76190476 0.75 0.95 ] mean value: 0.8533333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.60357143 0.54107143 0.57678571 0.59107143 0.57619048 0.61731602 0.56493506 0.62640693 0.475 0.62954545] mean value: 0.580189393939394 key: train_roc_auc value: [0.68172691 0.63253012 0.67168675 0.62550201 0.62725451 0.70541082 0.64028056 0.69238477 0.64629259 0.63426854] mean value: 0.6557337566699665 key: test_jcc value: [0.30357143 0.26984127 0.28813559 0.3015873 0.30769231 0.32 0.29508197 0.32653061 0.234375 0.32758621] mean value: 0.2974401687267211 key: train_jcc value: [0.36726547 0.33454545 0.36007828 0.33034111 0.32972973 0.3836478 0.33763838 0.37346939 0.34264432 0.33515483] mean value: 0.3494514754466544 MCC on Blind test: 0.14 Accuracy on Blind test: 0.42 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.0312705 0.03619003 0.03808308 0.03740096 0.01732135 0.01709747 0.01744127 0.02994561 0.04224467 0.03739405] mean value: 0.030438899993896484 key: score_time value: [0.02949142 0.02922988 0.02837372 0.01918077 0.01225829 0.0122056 0.01218557 0.01889825 0.02776265 0.01940441] mean value: 0.020899057388305664 key: test_mcc value: [0.58196658 0.63304195 0.56390496 0.67273572 0.64368314 0.28501393 0.44503488 0.62471635 0.72727273 0.61930936] mean value: 0.5796679617856088 key: train_mcc value: [0.74528173 0.74306574 0.72650934 0.73282742 0.74496772 0.74699975 0.74021989 0.73268489 0.71290303 0.73493383] mean value: 0.7360393351702404 key: test_accuracy value: [0.82894737 0.85526316 0.84210526 0.86842105 0.85526316 0.73684211 0.78947368 0.85526316 0.89333333 0.85333333] mean value: 0.8378245614035088 key: train_accuracy value: [0.90029326 0.90029326 0.89442815 0.89589443 0.90175953 0.90175953 0.89882698 0.89589443 0.88872621 0.89751098] mean value: 0.8975386748989923 key: test_fscore value: [0.69767442 0.73170732 0.64705882 0.76190476 0.74418605 0.44444444 0.57894737 0.71794872 0.8 0.71794872] mean value: 0.6841820616386557 key: train_fscore value: [0.81318681 0.81005587 0.79661017 0.8033241 0.8101983 0.81337047 0.80886427 0.8033241 0.7877095 0.80337079] mean value: 0.8050014371518536 key: test_precision value: [0.65217391 0.71428571 0.78571429 0.72727273 0.72727273 0.53333333 0.64705882 0.77777778 0.8 0.73684211] mean value: 0.7101731407492614 key: train_precision value: [0.82222222 0.83333333 0.82941176 0.81920904 0.84117647 0.82954545 0.82022472 0.81460674 0.81034483 0.83139535] mean value: 0.8251469922040724 key: test_recall value: [0.75 0.75 0.55 0.8 0.76190476 0.38095238 0.52380952 0.66666667 0.8 0.7 ] mean value: 0.6683333333333333 key: train_recall value: [0.80434783 0.78804348 0.76630435 0.78804348 0.78142077 0.79781421 0.79781421 0.79234973 0.76630435 0.77717391] mean value: 0.7859616298408173 key: test_roc_auc value: [0.80357143 0.82142857 0.74821429 0.84642857 0.82640693 0.62683983 0.70735931 0.7969697 0.86363636 0.80454545] mean value: 0.7845400432900433 key: train_roc_auc value: [0.8700454 0.86490527 0.85403571 0.86189323 0.86365627 0.86884698 0.86684298 0.86310873 0.85008604 0.85952884] mean value: 0.8622949451889779 key: test_jcc value: [0.53571429 0.57692308 0.47826087 0.61538462 0.59259259 0.28571429 0.40740741 0.56 0.66666667 0.56 ] mean value: 0.5278663799968147 key: train_jcc value: [0.68518519 0.68075117 0.66197183 0.6712963 0.68095238 0.68544601 0.67906977 0.6712963 0.64976959 0.6713615 ] mean value: 0.67371000278574 MCC on Blind test: 0.59 Accuracy on Blind test: 0.84 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:115: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:118: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.25372815 0.26200867 0.38737082 0.51150823 0.44220877 0.36624289 0.32012987 0.29179287 0.28646541 0.35943341] mean value: 0.3480889081954956 key: score_time value: [0.01247811 0.01930356 0.02239323 0.02633381 0.02799511 0.01895785 0.03551054 0.01894855 0.0190196 0.01937056] mean value: 0.022031092643737794 key: test_mcc value: [0.58196658 0.63304195 0.56390496 0.67273572 0.64368314 0.28501393 0.44503488 0.62471635 0.72727273 0.61930936] mean value: 0.5796679617856088 key: train_mcc value: [0.74528173 0.74306574 0.72650934 0.73282742 0.74496772 0.74699975 0.74021989 0.73268489 0.71290303 0.73493383] mean value: 0.7360393351702404 key: test_accuracy value: [0.82894737 0.85526316 0.84210526 0.86842105 0.85526316 0.73684211 0.78947368 0.85526316 0.89333333 0.85333333] mean value: 0.8378245614035088 key: train_accuracy value: [0.90029326 0.90029326 0.89442815 0.89589443 0.90175953 0.90175953 0.89882698 0.89589443 0.88872621 0.89751098] mean value: 0.8975386748989923 key: test_fscore value: [0.69767442 0.73170732 0.64705882 0.76190476 0.74418605 0.44444444 0.57894737 0.71794872 0.8 0.71794872] mean value: 0.6841820616386557 key: train_fscore value: [0.81318681 0.81005587 0.79661017 0.8033241 0.8101983 0.81337047 0.80886427 0.8033241 0.7877095 0.80337079] mean value: 0.8050014371518536 key: test_precision value: [0.65217391 0.71428571 0.78571429 0.72727273 0.72727273 0.53333333 0.64705882 0.77777778 0.8 0.73684211] mean value: 0.7101731407492614 key: train_precision value: [0.82222222 0.83333333 0.82941176 0.81920904 0.84117647 0.82954545 0.82022472 0.81460674 0.81034483 0.83139535] mean value: 0.8251469922040724 key: test_recall value: [0.75 0.75 0.55 0.8 0.76190476 0.38095238 0.52380952 0.66666667 0.8 0.7 ] mean value: 0.6683333333333333 key: train_recall value: [0.80434783 0.78804348 0.76630435 0.78804348 0.78142077 0.79781421 0.79781421 0.79234973 0.76630435 0.77717391] mean value: 0.7859616298408173 key: test_roc_auc value: [0.80357143 0.82142857 0.74821429 0.84642857 0.82640693 0.62683983 0.70735931 0.7969697 0.86363636 0.80454545] mean value: 0.7845400432900433 key: train_roc_auc value: [0.8700454 0.86490527 0.85403571 0.86189323 0.86365627 0.86884698 0.86684298 0.86310873 0.85008604 0.85952884] mean value: 0.8622949451889779 key: test_jcc value: [0.53571429 0.57692308 0.47826087 0.61538462 0.59259259 0.28571429 0.40740741 0.56 0.66666667 0.56 ] mean value: 0.5278663799968147 key: train_jcc value: [0.68518519 0.68075117 0.66197183 0.6712963 0.68095238 0.68544601 0.67906977 0.6712963 0.64976959 0.6713615 ] mean value: 0.67371000278574 MCC on Blind test: 0.59 Accuracy on Blind test: 0.84 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04412484 0.04355478 0.05054665 0.05056119 0.04378176 0.04390931 0.04411793 0.05126095 0.05250835 0.05323243] mean value: 0.04775981903076172 key: score_time value: [0.01327801 0.01320815 0.01321435 0.01340771 0.02071118 0.01331592 0.01335812 0.01337099 0.01346803 0.01352692] mean value: 0.014085936546325683 key: test_mcc value: [0.65875884 0.76590909 0.78434561 0.79230071 0.82447186 0.75530907 0.76675488 0.75530907 0.8376106 0.83984125] mean value: 0.7780610974433546 key: train_mcc value: [0.82698297 0.82612027 0.83101258 0.81404424 0.82455243 0.82777216 0.82298562 0.81837105 0.80646893 0.82284436] mean value: 0.8221154595374749 key: test_accuracy value: [0.82882883 0.88288288 0.89189189 0.89189189 0.90990991 0.87387387 0.88288288 0.87387387 0.91818182 0.91818182] mean value: 0.8872399672399672 key: train_accuracy value: [0.91173521 0.91173521 0.91374122 0.90571715 0.9107322 0.91273821 0.90972919 0.90772317 0.90180361 0.90981964] mean value: 0.9095474801156977 key: test_fscore value: [0.83185841 0.88288288 0.89285714 0.89830508 0.91525424 0.88333333 0.88695652 0.88333333 0.92035398 0.92173913] mean value: 0.8916874055995034 key: train_fscore value: [0.91570881 0.91522158 0.91762452 0.90944123 0.91434071 0.91577928 0.91362764 0.91136802 0.90576923 0.91362764] mean value: 0.9132508666793058 key: test_precision value: [0.81034483 0.875 0.87719298 0.84126984 0.87096774 0.828125 0.86440678 0.828125 0.89655172 0.88333333] mean value: 0.8575317230379954 key: train_precision value: [0.87706422 0.8812616 0.87889908 0.87569573 0.8780037 0.88411215 0.875 0.87592593 0.87060998 0.87661142] mean value: 0.8773183803018094 key: test_recall value: [0.85454545 0.89090909 0.90909091 0.96363636 0.96428571 0.94642857 0.91071429 0.94642857 0.94545455 0.96363636] mean value: 0.929512987012987 key: train_recall value: [0.95791583 0.95190381 0.95991984 0.94589178 0.95381526 0.9497992 0.95582329 0.9497992 0.94388778 0.95390782] mean value: 0.952266380149858 key: test_roc_auc value: [0.82905844 0.88295455 0.89204545 0.89253247 0.90941558 0.87321429 0.88262987 0.87321429 0.91818182 0.91818182] mean value: 0.8871428571428571 key: train_roc_auc value: [0.91168884 0.91169488 0.91369486 0.90567682 0.91077537 0.91277535 0.90977537 0.90776533 0.90180361 0.90981964] mean value: 0.9095470056579021 key: test_jcc value: [0.71212121 0.79032258 0.80645161 0.81538462 0.84375 0.79104478 0.796875 0.79104478 0.85245902 0.85483871] mean value: 0.8054292299363882 key: train_jcc value: [0.84452297 0.84369449 0.84778761 0.83392226 0.84219858 0.84464286 0.8409894 0.83716814 0.82776801 0.8409894 ] mean value: 0.8403683727027139 MCC on Blind test: 0.64 Accuracy on Blind test: 0.84 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.17848635 1.02553439 1.21671796 1.04096627 1.16208816 1.04139638 1.0986526 1.04701781 1.10708237 1.07402682] mean value: 1.0991969108581543 key: score_time value: [0.01554346 0.01578808 0.01572728 0.01564598 0.01846838 0.013484 0.01977873 0.01353335 0.01727462 0.01542902] mean value: 0.01606729030609131 key: test_mcc value: [0.71168831 0.78434561 0.856354 0.80845318 0.76868784 0.78818464 0.80286425 0.77570306 0.78181818 0.80119274] mean value: 0.7879291811034943 key: train_mcc value: [0.84975048 0.83532183 0.86829651 0.83177022 0.88050193 0.8760821 0.84009966 0.83344638 0.8317077 0.88033245] mean value: 0.8527309266989362 key: test_accuracy value: [0.85585586 0.89189189 0.92792793 0.9009009 0.88288288 0.89189189 0.9009009 0.88288288 0.89090909 0.9 ] mean value: 0.8926044226044226 key: train_accuracy value: [0.92377131 0.91675025 0.9338014 0.91474423 0.93981946 0.93781344 0.91875627 0.91574724 0.91482966 0.93987976] mean value: 0.9255913029670173 key: test_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.85454545 0.89285714 0.92592593 0.90598291 0.88888889 0.89830508 0.90434783 0.89256198 0.89090909 0.90265487] mean value: 0.895697917066984 key: train_fscore value: [0.92649903 0.91949564 0.93516699 0.9178744 0.9410609 0.93873518 0.92173913 0.9184466 0.91771539 0.94094488] mean value: 0.9277678146355568 key: test_precision value: [0.85454545 0.87719298 0.94339623 0.85483871 0.85245902 0.85483871 0.88135593 0.83076923 0.89090909 0.87931034] mean value: 0.8719615697874268 key: train_precision value: [0.8953271 0.89097744 0.91714836 0.88619403 0.92115385 0.92412451 0.88826816 0.88909774 0.88764045 0.9245648 ] mean value: 0.9024496445400005 key: test_recall value: [0.85454545 0.90909091 0.90909091 0.96363636 0.92857143 0.94642857 0.92857143 0.96428571 0.89090909 0.92727273] mean value: 0.9222402597402597 key: train_recall value: [0.95991984 0.9498998 0.95390782 0.95190381 0.96184739 0.95381526 0.95783133 0.9497992 0.9498998 0.95791583] mean value: 0.9546740066478338 key: test_roc_auc value: [0.85584416 0.89204545 0.92775974 0.90146104 0.88246753 0.8913961 0.90064935 0.88214286 0.89090909 0.9 ] mean value: 0.8924675324675325 key: train_roc_auc value: [0.92373502 0.91671697 0.93378122 0.91470692 0.93984153 0.93782947 0.91879542 0.91578136 0.91482966 0.93987976] mean value: 0.9255897336842359 key: test_jcc value: [0.74603175 0.80645161 0.86206897 0.828125 0.8 0.81538462 0.82539683 0.80597015 0.80327869 0.82258065] mean value: 0.8115288248173266 key: train_jcc value: [0.86306306 0.85098743 0.87822878 0.84821429 0.88868275 0.88454376 0.85483871 0.8491921 0.84794275 0.88847584] mean value: 0.8654169472771298 MCC on Blind test: 0.64 Accuracy on Blind test: 0.85 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0201087 0.01285744 0.01268411 0.01318502 0.0148437 0.01225019 0.01224089 0.01258802 0.01277614 0.01228714] mean value: 0.013582134246826172 key: score_time value: [0.01278663 0.00968266 0.00952196 0.00947142 0.00946116 0.00926948 0.00921249 0.00973344 0.00934577 0.00983071] mean value: 0.009831571578979492 key: test_mcc value: [0.57297043 0.50450356 0.49545455 0.62268473 0.67564935 0.64303575 0.46096379 0.55139323 0.71097366 0.67451348] mean value: 0.5912142519514377 key: train_mcc value: [0.6049526 0.61319134 0.60481447 0.59085458 0.59881849 0.60886065 0.62097028 0.61691639 0.59338778 0.57915832] mean value: 0.6031924888496714 key: test_accuracy value: [0.78378378 0.74774775 0.74774775 0.81081081 0.83783784 0.81981982 0.72972973 0.77477477 0.85454545 0.83636364] mean value: 0.7943161343161343 key: train_accuracy value: [0.80240722 0.80541625 0.80240722 0.79538616 0.79939819 0.80441324 0.81043129 0.80842528 0.79659319 0.78957916] mean value: 0.801445719925307 key: test_fscore value: [0.76470588 0.76666667 0.74545455 0.81415929 0.83928571 0.83050847 0.72222222 0.78632479 0.85964912 0.83018868] mean value: 0.7959165385970846 key: train_fscore value: [0.80475719 0.81381958 0.8028028 0.79393939 0.8 0.80519481 0.8119403 0.80957129 0.7992087 0.78957916] mean value: 0.8030813212223025 key: test_precision value: [0.82978723 0.70769231 0.74545455 0.79310345 0.83928571 0.79032258 0.75 0.75409836 0.83050847 0.8627451 ] mean value: 0.7902997763667369 key: train_precision value: [0.79607843 0.78084715 0.802 0.80040733 0.79681275 0.80119284 0.80473373 0.8039604 0.7890625 0.78957916] mean value: 0.7964674282949357 key: test_recall value: [0.70909091 0.83636364 0.74545455 0.83636364 0.83928571 0.875 0.69642857 0.82142857 0.89090909 0.8 ] mean value: 0.8050324675324675 key: train_recall value: [0.81362725 0.8496994 0.80360721 0.78757515 0.80321285 0.80923695 0.81927711 0.81526104 0.80961924 0.78957916] mean value: 0.8100695366636889 key: test_roc_auc value: [0.78311688 0.74853896 0.74772727 0.81103896 0.83782468 0.81931818 0.73003247 0.77435065 0.85454545 0.83636364] mean value: 0.7942857142857143 key: train_roc_auc value: [0.80239596 0.80537179 0.80240602 0.795394 0.79940202 0.80441807 0.81044016 0.80843213 0.79659319 0.78957916] mean value: 0.8014432479416664 key: test_jcc value: [0.61904762 0.62162162 0.5942029 0.68656716 0.72307692 0.71014493 0.56521739 0.64788732 0.75384615 0.70967742] mean value: 0.6631289442461227 key: train_jcc value: [0.67330017 0.68608414 0.67056856 0.65829146 0.66666667 0.67391304 0.68341709 0.680067 0.66556837 0.65231788] mean value: 0.6710194374461457 MCC on Blind test: 0.37 Accuracy on Blind test: 0.74 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01392293 0.01701427 0.01723433 0.01750612 0.01728272 0.01705217 0.01722479 0.01711893 0.01728821 0.01710558] mean value: 0.01687500476837158 key: score_time value: [0.01087117 0.01245928 0.01249266 0.01245236 0.01248884 0.01242113 0.01247978 0.01248288 0.01243448 0.01248121] mean value: 0.01230638027191162 key: test_mcc value: [0.58827674 0.64014294 0.56929191 0.7137294 0.62939373 0.69891539 0.66004053 0.64303575 0.73323558 0.80119274] mean value: 0.667725470808785 key: train_mcc value: [0.70693545 0.72572666 0.68759086 0.71792245 0.68682818 0.71394469 0.68575288 0.7136947 0.7113771 0.71018107] mean value: 0.7059954028409837 key: test_accuracy value: [0.79279279 0.81981982 0.78378378 0.85585586 0.81081081 0.84684685 0.82882883 0.81981982 0.86363636 0.9 ] mean value: 0.8322194922194922 key: train_accuracy value: [0.85155466 0.86158475 0.84252758 0.85757272 0.84152457 0.8555667 0.84152457 0.8555667 0.85470942 0.85370741] mean value: 0.8515839100467736 key: test_fscore value: [0.8 0.82142857 0.78947368 0.85964912 0.82644628 0.85714286 0.83760684 0.83050847 0.87179487 0.90265487] mean value: 0.8396705567815326 key: train_fscore value: [0.85904762 0.86730769 0.84918348 0.86372361 0.84923664 0.86153846 0.84807692 0.86127168 0.85990338 0.85988484] mean value: 0.8579174317858217 key: test_precision value: [0.76666667 0.80701754 0.76271186 0.83050847 0.76923077 0.80952381 0.80327869 0.79032258 0.82258065 0.87931034] mean value: 0.8041151387422574 key: train_precision value: [0.8185118 0.8336414 0.81549815 0.82872928 0.80909091 0.82656827 0.81365314 0.82777778 0.83022388 0.82504604] mean value: 0.8228740648484011 key: test_recall value: [0.83636364 0.83636364 0.81818182 0.89090909 0.89285714 0.91071429 0.875 0.875 0.92727273 0.92727273] mean value: 0.8789935064935065 key: train_recall value: [0.90380762 0.90380762 0.88577154 0.90180361 0.8935743 0.89959839 0.88554217 0.89759036 0.89178357 0.89779559] mean value: 0.896107475996169 key: test_roc_auc value: [0.79318182 0.81996753 0.78409091 0.85616883 0.81006494 0.84626623 0.82840909 0.81931818 0.86363636 0.9 ] mean value: 0.8321103896103896 key: train_roc_auc value: [0.8515022 0.86154236 0.84248417 0.85752831 0.84157673 0.85561082 0.84156868 0.85560881 0.85470942 0.85370741] mean value: 0.8515838906729121 key: test_jcc value: [0.66666667 0.6969697 0.65217391 0.75384615 0.70422535 0.75 0.72058824 0.71014493 0.77272727 0.82258065] mean value: 0.7249922863357584 key: train_jcc value: [0.75292154 0.76570458 0.73789649 0.76013514 0.73797678 0.75675676 0.73622705 0.75634518 0.75423729 0.75420875] mean value: 0.7512409553820072 MCC on Blind test: 0.52 Accuracy on Blind test: 0.8 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01607823 0.01308084 0.01161551 0.01328683 0.01202369 0.01301265 0.01235914 0.01249504 0.01264787 0.01307225] mean value: 0.012967205047607422 key: score_time value: [0.03294516 0.01623058 0.01591921 0.01628637 0.01585436 0.01614761 0.01592636 0.01668 0.01613188 0.01644087] mean value: 0.017856240272521973 key: test_mcc value: [0.62443328 0.62701706 0.67994289 0.79230071 0.62617314 0.6914393 0.63352948 0.76868784 0.76477489 0.67451348] mean value: 0.688281206745035 key: train_mcc value: [0.80337288 0.78897563 0.78668689 0.80851256 0.80226699 0.79915817 0.79748273 0.79748273 0.7955465 0.7970329 ] mean value: 0.7976517990393334 key: test_accuracy value: [0.81081081 0.81081081 0.83783784 0.89189189 0.81081081 0.83783784 0.81081081 0.88288288 0.88181818 0.83636364] mean value: 0.8411875511875512 key: train_accuracy value: [0.90070211 0.89368104 0.89167503 0.90371113 0.8996991 0.89869609 0.89769308 0.89769308 0.89679359 0.89779559] mean value: 0.8978139830312581 key: test_fscore value: [0.8173913 0.82051282 0.84482759 0.89830508 0.82352941 0.85483871 0.82926829 0.88888889 0.88495575 0.83018868] mean value: 0.8492706530284919 key: train_fscore value: [0.90416263 0.89708738 0.89655172 0.90625 0.90366089 0.90184645 0.90116279 0.90116279 0.90029042 0.90077821] mean value: 0.9012953282848261 key: test_precision value: [0.78333333 0.77419355 0.80327869 0.84126984 0.77777778 0.77941176 0.76119403 0.85245902 0.86206897 0.8627451 ] mean value: 0.8097732063799168 key: train_precision value: [0.87453184 0.8700565 0.8587156 0.88380952 0.86851852 0.87382298 0.87078652 0.87078652 0.87078652 0.87523629] mean value: 0.8717050792015171 key: test_recall value: [0.85454545 0.87272727 0.89090909 0.96363636 0.875 0.94642857 0.91071429 0.92857143 0.90909091 0.8 ] mean value: 0.8951623376623377 key: train_recall value: [0.93587174 0.9258517 0.93787575 0.92985972 0.94176707 0.93172691 0.93373494 0.93373494 0.93186373 0.92785571] mean value: 0.9330142212135114 key: test_roc_auc value: [0.8112013 0.81136364 0.83831169 0.89253247 0.81022727 0.83685065 0.8099026 0.88246753 0.88181818 0.83636364] mean value: 0.8411038961038961 key: train_roc_auc value: [0.9006668 0.89364874 0.89162864 0.90368488 0.89974125 0.89872919 0.89772919 0.89772919 0.89679359 0.89779559] mean value: 0.8978147057166542 key: test_jcc value: [0.69117647 0.69565217 0.73134328 0.81538462 0.7 0.74647887 0.70833333 0.8 0.79365079 0.70967742] mean value: 0.7391696963046386 key: train_jcc value: [0.82508834 0.81338028 0.8125 0.82857143 0.82425308 0.82123894 0.82010582 0.82010582 0.81866197 0.81946903] mean value: 0.8203374701699758 MCC on Blind test: 0.46 Accuracy on Blind test: 0.77 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.06150103 0.05591226 0.06038022 0.05534601 0.05132461 0.05020452 0.05948067 0.06166291 0.05109692 0.06067967] mean value: 0.056758880615234375 key: score_time value: [0.01889443 0.019135 0.01991487 0.020509 0.01846647 0.01887274 0.01900005 0.02215457 0.01905775 0.01903105] mean value: 0.01950359344482422 key: test_mcc value: [0.64014294 0.73290291 0.73290291 0.75592959 0.84439989 0.75979502 0.74951538 0.72309474 0.79022225 0.87402845] mean value: 0.7602934058473255 key: train_mcc value: [0.81034285 0.81117338 0.80484205 0.80964527 0.80931593 0.81371644 0.81962105 0.80783682 0.80397646 0.79958404] mean value: 0.8090054298398948 key: test_accuracy value: [0.81981982 0.86486486 0.86486486 0.87387387 0.91891892 0.87387387 0.87387387 0.85585586 0.89090909 0.93636364] mean value: 0.8773218673218673 key: train_accuracy value: [0.90270812 0.90371113 0.8996991 0.90270812 0.90170512 0.90471414 0.90672016 0.90170512 0.8997996 0.89679359] mean value: 0.9020264199411863 key: test_fscore value: [0.82142857 0.86956522 0.86956522 0.88135593 0.92436975 0.8852459 0.87931034 0.86885246 0.89830508 0.9380531 ] mean value: 0.8836051573887949 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( train_fscore value: [0.90788224 0.9082218 0.90530303 0.90753098 0.90719697 0.90926457 0.91201514 0.90648855 0.9047619 0.90273843] mean value: 0.9071403609895646 key: test_precision value: [0.80701754 0.83333333 0.83333333 0.82539683 0.87301587 0.81818182 0.85 0.8030303 0.84126984 0.9137931 ] mean value: 0.8398371974869252 key: train_precision value: [0.86281588 0.86837294 0.85816876 0.86545455 0.85842294 0.86703097 0.86225403 0.86363636 0.86206897 0.85357143] mean value: 0.8621796821708623 key: test_recall value: [0.83636364 0.90909091 0.90909091 0.94545455 0.98214286 0.96428571 0.91071429 0.94642857 0.96363636 0.96363636] mean value: 0.9330844155844156 key: train_recall value: [0.95791583 0.95190381 0.95791583 0.95390782 0.96184739 0.95582329 0.96787149 0.95381526 0.95190381 0.95791583] mean value: 0.9570820355570578 key: test_roc_auc value: [0.81996753 0.86525974 0.86525974 0.87451299 0.91834416 0.87305195 0.87353896 0.85503247 0.89090909 0.93636364] mean value: 0.8772240259740259 key: train_roc_auc value: [0.90265269 0.90366275 0.89964065 0.90265672 0.90176538 0.90476535 0.90678143 0.90175733 0.8997996 0.89679359] mean value: 0.9020275490740517 key: test_jcc value: [0.6969697 0.76923077 0.76923077 0.78787879 0.859375 0.79411765 0.78461538 0.76811594 0.81538462 0.88333333] mean value: 0.7928251945731166 key: train_jcc value: [0.83130435 0.83187391 0.82698962 0.83071553 0.83015598 0.83362522 0.83826087 0.82897033 0.82608696 0.82271945] mean value: 0.8300702209936055 MCC on Blind test: 0.61 Accuracy on Blind test: 0.83 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.62372518 2.43982482 4.13287544 4.33861852 4.25920129 2.64669728 3.125072 3.28911853 2.51461816 2.4663496 ] mean value: 3.1836100816726685 key: score_time value: [0.01274467 0.01836348 0.01655102 0.01567531 0.01266837 0.01298928 0.01277828 0.01279283 0.01288271 0.01289606] mean value: 0.014034199714660644 key: test_mcc value: [0.69373177 0.84137254 0.80286425 0.805216 0.8049036 0.82447186 0.81980519 0.87733514 0.85681441 0.83650191] mean value: 0.8163016684437298 key: train_mcc value: [0.95991082 0.92035652 0.99200779 0.98595987 0.96427298 0.9526244 0.97000492 0.96613216 0.94644118 0.94994749] mean value: 0.960765812657737 key: test_accuracy value: [0.84684685 0.91891892 0.9009009 0.9009009 0.9009009 0.90990991 0.90990991 0.93693694 0.92727273 0.91818182] mean value: 0.907067977067977 key: train_accuracy value: [0.97993982 0.95987964 0.99598796 0.99297894 0.98194584 0.97592778 0.98495486 0.98294885 0.97294589 0.9749499 ] mean value: 0.9802459482656386 key: test_fscore value: [0.8440367 0.92173913 0.89719626 0.90434783 0.90598291 0.91525424 0.91071429 0.94017094 0.92982456 0.91743119] mean value: 0.9086698038672015 key: train_fscore value: [0.97987928 0.96062992 0.99600798 0.99297894 0.98217822 0.97637795 0.98483316 0.98274112 0.97339901 0.97507478] mean value: 0.9804100360349339 key: test_precision value: [0.85185185 0.88333333 0.92307692 0.86666667 0.86885246 0.87096774 0.91071429 0.90163934 0.89830508 0.92592593] mean value: 0.8901333616528921 key: train_precision value: [0.98383838 0.94390716 0.99204771 0.9939759 0.96875 0.95752896 0.99185336 0.99383984 0.95736434 0.9702381 ] mean value: 0.9753343747913725 key: test_recall value: [0.83636364 0.96363636 0.87272727 0.94545455 0.94642857 0.96428571 0.91071429 0.98214286 0.96363636 0.90909091] mean value: 0.9294480519480519 key: train_recall value: [0.9759519 0.97795591 1. 0.99198397 0.99598394 0.99598394 0.97791165 0.97188755 0.98997996 0.97995992] mean value: 0.9857598731599746 key: test_roc_auc value: [0.84675325 0.91931818 0.90064935 0.9012987 0.90048701 0.90941558 0.9099026 0.93652597 0.92727273 0.91818182] mean value: 0.9069805194805195 key: train_roc_auc value: [0.97994382 0.95986149 0.99598394 0.99297994 0.9819599 0.97594788 0.98494781 0.98293776 0.97294589 0.9749499 ] mean value: 0.9802458330315249 key: test_jcc value: [0.73015873 0.85483871 0.81355932 0.82539683 0.828125 0.84375 0.83606557 0.88709677 0.86885246 0.84745763] mean value: 0.8335301021365951 key: train_jcc value: [0.96055227 0.92424242 0.99204771 0.98605578 0.96498054 0.95384615 0.97011952 0.96606786 0.94817658 0.95136187] mean value: 0.961745071907173 MCC on Blind test: 0.61 Accuracy on Blind test: 0.84 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.07478142 0.07933569 0.06970549 0.07234645 0.08080792 0.06816864 0.06994104 0.09257722 0.09551525 0.06402373] mean value: 0.07672028541564942 key: score_time value: [0.00968766 0.00928879 0.0095427 0.00937366 0.00937152 0.00922513 0.00954008 0.00962567 0.00966001 0.0095737 ] mean value: 0.009488892555236817 key: test_mcc value: [0.6962563 0.86102173 0.73528651 0.80188377 0.78818464 0.87733514 0.78420577 0.80305531 0.80013226 0.78389404] mean value: 0.7931255470199186 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84684685 0.92792793 0.86486486 0.9009009 0.89189189 0.93693694 0.89189189 0.9009009 0.9 0.89090909] mean value: 0.8953071253071253 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83809524 0.93103448 0.85436893 0.89908257 0.89830508 0.94017094 0.89473684 0.89908257 0.89908257 0.88679245] mean value: 0.8940751679166867 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88 0.8852459 0.91666667 0.90740741 0.85483871 0.90163934 0.87931034 0.9245283 0.90740741 0.92156863] mean value: 0.89786127112259 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8 0.98181818 0.8 0.89090909 0.94642857 0.98214286 0.91071429 0.875 0.89090909 0.85454545] mean value: 0.8932467532467532 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.84642857 0.92840909 0.86428571 0.90081169 0.8913961 0.93652597 0.89172078 0.90113636 0.9 0.89090909] mean value: 0.8951623376623377 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.72131148 0.87096774 0.74576271 0.81666667 0.81538462 0.88709677 0.80952381 0.81666667 0.81666667 0.79661017] mean value: 0.8096657297803226 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.7 Accuracy on Blind test: 0.88 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.1862402 0.19378781 0.18132305 0.1813972 0.18763399 0.18403673 0.18406081 0.18009973 0.1800344 0.17876792] mean value: 0.18373818397521974 key: score_time value: [0.0201416 0.02079272 0.02035451 0.02068305 0.0194726 0.01997471 0.02043581 0.0189383 0.01899672 0.01892877] mean value: 0.0198718786239624 key: test_mcc value: [0.7306455 0.78420577 0.74772727 0.78859019 0.87733514 0.856354 0.78567192 0.84111937 0.87287156 0.8376106 ] mean value: 0.8122131325526036 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86486486 0.89189189 0.87387387 0.89189189 0.93693694 0.92792793 0.89189189 0.91891892 0.93636364 0.91818182] mean value: 0.9052743652743653 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85981308 0.88888889 0.87272727 0.89655172 0.94017094 0.92982456 0.89655172 0.92307692 0.93693694 0.92035398] mean value: 0.9064896037893366 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88461538 0.90566038 0.87272727 0.85245902 0.90163934 0.9137931 0.86666667 0.8852459 0.92857143 0.89655172] mean value: 0.8907930219820532 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.83636364 0.87272727 0.87272727 0.94545455 0.98214286 0.94642857 0.92857143 0.96428571 0.94545455 0.94545455] mean value: 0.923961038961039 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86461039 0.89172078 0.87386364 0.89237013 0.93652597 0.92775974 0.89155844 0.91850649 0.93636364 0.91818182] mean value: 0.905146103896104 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75409836 0.8 0.77419355 0.8125 0.88709677 0.86885246 0.8125 0.85714286 0.88135593 0.85245902] mean value: 0.8300198947992465 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.6 Accuracy on Blind test: 0.84 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01281238 0.01285768 0.01273155 0.01263881 0.01268411 0.01274562 0.01275444 0.01256347 0.01274872 0.01252794] mean value: 0.012706470489501954 key: score_time value: [0.0092659 0.00906849 0.00905991 0.00917387 0.0091598 0.00904346 0.00915551 0.00906682 0.00916243 0.00901937] mean value: 0.009117555618286134 key: test_mcc value: [0.58557976 0.57765823 0.66058982 0.62443328 0.71884134 0.58760899 0.6962563 0.53199093 0.67272727 0.5304385 ] mean value: 0.6186124436559258 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.79279279 0.78378378 0.82882883 0.81081081 0.85585586 0.79279279 0.84684685 0.76576577 0.83636364 0.76363636] mean value: 0.8077477477477477 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.78899083 0.8 0.83478261 0.8173913 0.86666667 0.8034188 0.85470085 0.76363636 0.83636364 0.77586207] mean value: 0.8141813132483394 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7962963 0.73846154 0.8 0.78333333 0.8125 0.7704918 0.81967213 0.77777778 0.83636364 0.73770492] mean value: 0.7872601434691598 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.78181818 0.87272727 0.87272727 0.85454545 0.92857143 0.83928571 0.89285714 0.75 0.83636364 0.81818182] mean value: 0.8447077922077922 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.79269481 0.78457792 0.82922078 0.8112013 0.85519481 0.79237013 0.84642857 0.76590909 0.83636364 0.76363636] mean value: 0.8077597402597403 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.65151515 0.66666667 0.71641791 0.69117647 0.76470588 0.67142857 0.74626866 0.61764706 0.71875 0.63380282] mean value: 0.6878379185440683 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.46 Accuracy on Blind test: 0.77 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [3.34462881 3.33502793 3.3337965 3.32841372 3.34471703 3.3171823 3.30564308 3.33460903 3.3558104 3.32796764] mean value: 3.3327796459198 key: score_time value: [0.09969783 0.10093069 0.09836888 0.09869719 0.0972259 0.09813404 0.09791899 0.1053617 0.10310555 0.09741282] mean value: 0.09968535900115967 key: test_mcc value: [0.82027988 0.80194805 0.89242811 0.87398511 0.89414155 0.84439989 0.856354 0.94730174 0.94561086 0.90924121] mean value: 0.878569039813939 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90990991 0.9009009 0.94594595 0.93693694 0.94594595 0.91891892 0.92792793 0.97297297 0.97272727 0.95454545] mean value: 0.9386732186732187 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90740741 0.9009009 0.94444444 0.93577982 0.94827586 0.92436975 0.92982456 0.97391304 0.97297297 0.95495495] mean value: 0.9392843712044336 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9245283 0.89285714 0.96226415 0.94444444 0.91666667 0.87301587 0.9137931 0.94915254 0.96428571 0.94642857] mean value: 0.9287436511349758 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.89090909 0.90909091 0.92727273 0.92727273 0.98214286 0.98214286 0.94642857 1. 0.98181818 0.96363636] mean value: 0.9510714285714286 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90974026 0.90097403 0.94577922 0.93685065 0.94561688 0.91834416 0.92775974 0.97272727 0.97272727 0.95454545] mean value: 0.9385064935064935 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83050847 0.81967213 0.89473684 0.87931034 0.90163934 0.859375 0.86885246 0.94915254 0.94736842 0.9137931 ] mean value: 0.8864408662809139 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.71 Accuracy on Blind test: 0.89 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.22340345 1.18731499 1.18555117 1.22307873 1.22605681 1.20699906 1.19885516 1.21126437 1.21934485 1.25942731] mean value: 1.214129590988159 key: score_time value: [0.20892453 0.24950743 0.22536755 0.18695331 0.23490357 0.29782224 0.26686096 0.28301716 0.21349239 0.1928103 ] mean value: 0.23596594333648682 key: test_mcc value: [0.80188377 0.78434561 0.87508299 0.91006494 0.86075909 0.84439989 0.856354 0.93029809 0.92727273 0.90924121] mean value: 0.86997023122815 key: train_mcc value: [0.9540162 0.95395495 0.94606663 0.94592996 0.94198 0.94994386 0.94198 0.94390049 0.94598487 0.94593927] mean value: 0.9469696225050219 key: test_accuracy value: [0.9009009 0.89189189 0.93693694 0.95495495 0.92792793 0.91891892 0.92792793 0.96396396 0.96363636 0.95454545] mean value: 0.9341605241605242 key: train_accuracy value: [0.97693079 0.97693079 0.97291876 0.97291876 0.97091274 0.97492477 0.97091274 0.97191575 0.97294589 0.97294589] mean value: 0.9734256878852992 key: test_fscore value: [0.89908257 0.89285714 0.93457944 0.95495495 0.93220339 0.92436975 0.92982456 0.96551724 0.96363636 0.95495495] mean value: 0.935198036497558 key: train_fscore value: [0.97715988 0.97711443 0.97324083 0.97313433 0.97114428 0.97507478 0.97114428 0.97205589 0.97313433 0.97308076] mean value: 0.9736283776755992 key: test_precision value: [0.90740741 0.87719298 0.96153846 0.94642857 0.88709677 0.87301587 0.9137931 0.93333333 0.96363636 0.94642857] mean value: 0.9209871441886547 key: train_precision value: [0.96850394 0.97035573 0.9627451 0.96640316 0.96252465 0.96831683 0.96252465 0.96626984 0.96640316 0.96825397] mean value: 0.966230104125473 key: test_recall value: [0.89090909 0.90909091 0.90909091 0.96363636 0.98214286 0.98214286 0.94642857 1. 0.96363636 0.96363636] mean value: 0.9510714285714286 key: train_recall value: [0.98597194 0.98396794 0.98396794 0.97995992 0.97991968 0.98192771 0.97991968 0.97791165 0.97995992 0.97795591] mean value: 0.9811462281993706 key: test_roc_auc value: [0.90081169 0.89204545 0.93668831 0.95503247 0.92743506 0.91834416 0.92775974 0.96363636 0.96363636 0.95454545] mean value: 0.9339935064935065 key: train_roc_auc value: [0.97692171 0.97692373 0.97290766 0.97291169 0.97092176 0.97493179 0.97092176 0.97192176 0.97294589 0.97294589] mean value: 0.9734253647857964 key: test_jcc value: [0.81666667 0.80645161 0.87719298 0.9137931 0.87301587 0.859375 0.86885246 0.93333333 0.92982456 0.9137931 ] mean value: 0.8792298695691693 key: train_jcc value: [0.95533981 0.95525292 0.94787645 0.94767442 0.94390716 0.95136187 0.94390716 0.94563107 0.94767442 0.94757282] mean value: 0.9486198073744585 MCC on Blind test: 0.76 Accuracy on Blind test: 0.9 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02724433 0.01684785 0.0170033 0.01703453 0.01710773 0.01734543 0.01699972 0.01705408 0.01712227 0.01719022] mean value: 0.018094944953918456 key: score_time value: [0.0127182 0.01238012 0.01267314 0.01244044 0.01244259 0.01242185 0.01242566 0.0123899 0.01235771 0.01240087] mean value: 0.01246504783630371 key: test_mcc value: [0.58827674 0.64014294 0.56929191 0.7137294 0.62939373 0.69891539 0.66004053 0.64303575 0.73323558 0.80119274] mean value: 0.667725470808785 key: train_mcc value: [0.70693545 0.72572666 0.68759086 0.71792245 0.68682818 0.71394469 0.68575288 0.7136947 0.7113771 0.71018107] mean value: 0.7059954028409837 key: test_accuracy value: [0.79279279 0.81981982 0.78378378 0.85585586 0.81081081 0.84684685 0.82882883 0.81981982 0.86363636 0.9 ] mean value: 0.8322194922194922 key: train_accuracy value: [0.85155466 0.86158475 0.84252758 0.85757272 0.84152457 0.8555667 0.84152457 0.8555667 0.85470942 0.85370741] mean value: 0.8515839100467736 key: test_fscore value: [0.8 0.82142857 0.78947368 0.85964912 0.82644628 0.85714286 0.83760684 0.83050847 0.87179487 0.90265487] mean value: 0.8396705567815326 key: train_fscore value: [0.85904762 0.86730769 0.84918348 0.86372361 0.84923664 0.86153846 0.84807692 0.86127168 0.85990338 0.85988484] mean value: 0.8579174317858217 key: test_precision value: [0.76666667 0.80701754 0.76271186 0.83050847 0.76923077 0.80952381 0.80327869 0.79032258 0.82258065 0.87931034] mean value: 0.8041151387422574 key: train_precision value: [0.8185118 0.8336414 0.81549815 0.82872928 0.80909091 0.82656827 0.81365314 0.82777778 0.83022388 0.82504604] mean value: 0.8228740648484011 key: test_recall value: [0.83636364 0.83636364 0.81818182 0.89090909 0.89285714 0.91071429 0.875 0.875 0.92727273 0.92727273] mean value: 0.8789935064935065 key: train_recall value: [0.90380762 0.90380762 0.88577154 0.90180361 0.8935743 0.89959839 0.88554217 0.89759036 0.89178357 0.89779559] mean value: 0.896107475996169 key: test_roc_auc value: [0.79318182 0.81996753 0.78409091 0.85616883 0.81006494 0.84626623 0.82840909 0.81931818 0.86363636 0.9 ] mean value: 0.8321103896103896 key: train_roc_auc value: [0.8515022 0.86154236 0.84248417 0.85752831 0.84157673 0.85561082 0.84156868 0.85560881 0.85470942 0.85370741] mean value: 0.8515838906729121 key: test_jcc value: [0.66666667 0.6969697 0.65217391 0.75384615 0.70422535 0.75 0.72058824 0.71014493 0.77272727 0.82258065] mean value: 0.7249922863357584 key: train_jcc value: [0.75292154 0.76570458 0.73789649 0.76013514 0.73797678 0.75675676 0.73622705 0.75634518 0.75423729 0.75420875] mean value: 0.7512409553820072 MCC on Blind test: 0.52 Accuracy on Blind test: 0.8 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.16932011 0.14638591 0.19654107 0.18655062 0.15451241 0.14728117 0.14999509 0.14579248 0.15139866 0.15017271] mean value: 0.15979502201080323 key: score_time value: [0.01136637 0.01236463 0.0134182 0.01141644 0.0115819 0.01142669 0.01245141 0.01140976 0.01291013 0.01137543] mean value: 0.01197209358215332 key: test_mcc value: [0.87402597 0.87402597 0.87398511 0.94735177 0.86471225 0.87508299 0.856354 0.94730174 0.82035423 0.9104463 ] mean value: 0.8843640328352657 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93693694 0.93693694 0.93693694 0.97297297 0.92792793 0.93693694 0.92792793 0.97297297 0.90909091 0.95454545] mean value: 0.9413185913185913 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93693694 0.93693694 0.93577982 0.97345133 0.93333333 0.93913043 0.92982456 0.97391304 0.9122807 0.95575221] mean value: 0.9427339304962741 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92857143 0.92857143 0.94444444 0.94827586 0.875 0.91525424 0.9137931 0.94915254 0.88135593 0.93103448] mean value: 0.9215453461727571 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.94545455 0.94545455 0.92727273 1. 1. 0.96428571 0.94642857 1. 0.94545455 0.98181818] mean value: 0.9656168831168831 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93701299 0.93701299 0.93685065 0.97321429 0.92727273 0.93668831 0.92775974 0.97272727 0.90909091 0.95454545] mean value: 0.9412175324675325 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88135593 0.88135593 0.87931034 0.94827586 0.875 0.8852459 0.86885246 0.94915254 0.83870968 0.91525424] mean value: 0.8922512889039441 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.91 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.07442832 0.07900333 0.06712317 0.0942688 0.07648659 0.10709262 0.08123636 0.103127 0.07679653 0.08166337] mean value: 0.08412261009216308 key: score_time value: [0.02036667 0.01255226 0.01248574 0.02048302 0.01988649 0.01711273 0.01995468 0.02388263 0.01327038 0.02085662] mean value: 0.018085122108459473 key: test_mcc value: [0.66058982 0.74983877 0.78434561 0.79230071 0.71884134 0.70720342 0.68237361 0.78818464 0.82035423 0.85967619] mean value: 0.7563708342016795 key: train_mcc value: [0.84314661 0.84170481 0.84384054 0.84384054 0.84930373 0.84740528 0.83959549 0.8341973 0.83810503 0.84400673] mean value: 0.8425146060693494 key: test_accuracy value: [0.82882883 0.87387387 0.89189189 0.89189189 0.85585586 0.84684685 0.83783784 0.89189189 0.90909091 0.92727273] mean value: 0.8755282555282555 key: train_accuracy value: [0.92076229 0.91975928 0.92076229 0.92076229 0.92377131 0.9227683 0.91875627 0.91574724 0.91783567 0.92084168] mean value: 0.9201766622512829 key: test_fscore value: [0.83478261 0.87719298 0.89285714 0.89830508 0.86666667 0.86178862 0.85 0.89830508 0.9122807 0.93103448] mean value: 0.8823213372566312 key: train_fscore value: [0.92322643 0.92263056 0.9236715 0.9236715 0.92607004 0.92517007 0.9214355 0.91891892 0.92084942 0.9236715 ] mean value: 0.9229315433333662 key: test_precision value: [0.8 0.84745763 0.87719298 0.84126984 0.8125 0.79104478 0.796875 0.85483871 0.88135593 0.8852459 ] mean value: 0.8387780770484182 key: train_precision value: [0.89622642 0.89158879 0.89179104 0.89179104 0.89811321 0.89642185 0.89118199 0.88475836 0.88826816 0.89179104] mean value: 0.8921931897070797 key: test_recall value: [0.87272727 0.90909091 0.90909091 0.96363636 0.92857143 0.94642857 0.91071429 0.94642857 0.94545455 0.98181818] mean value: 0.9313961038961038 key: train_recall value: [0.95190381 0.95591182 0.95791583 0.95791583 0.95582329 0.95582329 0.95381526 0.95582329 0.95591182 0.95791583] mean value: 0.9558760090462048 key: test_roc_auc value: [0.82922078 0.87418831 0.89204545 0.89253247 0.85519481 0.84594156 0.83717532 0.8913961 0.90909091 0.92727273] mean value: 0.8754058441558441 key: train_roc_auc value: [0.92073102 0.91972298 0.92072498 0.92072498 0.92380343 0.92280143 0.9187914 0.9157874 0.91783567 0.92084168] mean value: 0.9201764975734601 key: test_jcc value: [0.71641791 0.78125 0.80645161 0.81538462 0.76470588 0.75714286 0.73913043 0.81538462 0.83870968 0.87096774] mean value: 0.7905545347753463 key: train_jcc value: [0.85740072 0.85637343 0.85816876 0.85816876 0.86231884 0.86075949 0.85431655 0.85 0.85330948 0.85816876] mean value: 0.8568984796998163 MCC on Blind test: 0.57 Accuracy on Blind test: 0.82 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01855206 0.01753569 0.01760435 0.01728177 0.01746941 0.01717734 0.01688004 0.01726413 0.01733756 0.01808047] mean value: 0.017518281936645508 key: score_time value: [0.01274776 0.01306725 0.01289153 0.01265955 0.0127809 0.01270652 0.01321983 0.01303244 0.01284075 0.01262331] mean value: 0.01285698413848877 key: test_mcc value: [0.55139323 0.53257612 0.44274592 0.60540128 0.69373177 0.6962563 0.60540128 0.6962563 0.6401844 0.72835704] mean value: 0.6192303636153702 key: train_mcc value: [0.63490435 0.64093618 0.63093355 0.64504601 0.6248833 0.64093185 0.62689745 0.64098055 0.62750153 0.60320641] mean value: 0.6316221168713542 key: test_accuracy value: [0.77477477 0.76576577 0.72072072 0.8018018 0.84684685 0.84684685 0.8018018 0.84684685 0.81818182 0.86363636] mean value: 0.8087223587223588 key: train_accuracy value: [0.81745236 0.82046138 0.81544634 0.8224674 0.81243731 0.82046138 0.81344032 0.82046138 0.81362725 0.80160321] mean value: 0.8157858344572797 key: test_fscore value: [0.76190476 0.75471698 0.7047619 0.80701754 0.84955752 0.85470085 0.7962963 0.85470085 0.82758621 0.85981308] mean value: 0.8071056010488992 key: train_fscore value: [0.81763527 0.8201005 0.81673307 0.82103134 0.81168177 0.81973817 0.8125 0.8190091 0.81097561 0.80160321] mean value: 0.8151008041422524 key: test_precision value: [0.8 0.78431373 0.74 0.77966102 0.84210526 0.81967213 0.82692308 0.81967213 0.78688525 0.88461538] mean value: 0.8083847975332427 key: train_precision value: [0.81763527 0.82258065 0.81188119 0.82857143 0.81414141 0.82222222 0.81578947 0.82484725 0.82268041 0.80160321] mean value: 0.8181952511733585 key: test_recall value: [0.72727273 0.72727273 0.67272727 0.83636364 0.85714286 0.89285714 0.76785714 0.89285714 0.87272727 0.83636364] mean value: 0.8083441558441559 key: train_recall value: [0.81763527 0.81763527 0.82164329 0.81362725 0.80923695 0.81726908 0.80923695 0.81325301 0.7995992 0.80160321] mean value: 0.8120739470909691 key: test_roc_auc value: [0.77435065 0.76542208 0.72029221 0.80211039 0.84675325 0.84642857 0.80211039 0.84642857 0.81818182 0.86363636] mean value: 0.8085714285714286 key: train_roc_auc value: [0.81745217 0.82046422 0.81544012 0.82247628 0.81243411 0.82045819 0.81343611 0.82045416 0.81362725 0.80160321] mean value: 0.815784581210614 key: test_jcc value: [0.61538462 0.60606061 0.54411765 0.67647059 0.73846154 0.74626866 0.66153846 0.74626866 0.70588235 0.75409836] mean value: 0.6794551483769089 key: train_jcc value: [0.69152542 0.69505963 0.69023569 0.69639794 0.68305085 0.69453925 0.68421053 0.69349315 0.68205128 0.66889632] mean value: 0.6879460057585034 MCC on Blind test: 0.5 Accuracy on Blind test: 0.78 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03212476 0.02761483 0.03664923 0.02784562 0.0398612 0.0308187 0.02523756 0.03237534 0.03949404 0.03122282] mean value: 0.03232440948486328 key: score_time value: [0.01321888 0.012743 0.01287627 0.0127027 0.01272225 0.01279831 0.01288557 0.01277661 0.01274395 0.01294923] mean value: 0.01284167766571045 key: test_mcc value: [0.67994289 0.68231769 0.83897362 0.7763355 0.70340005 0.74951538 0.63046459 0.71350607 0.80332642 0.80119274] mean value: 0.7378974954164732 key: train_mcc value: [0.82132227 0.75838527 0.83021815 0.81368596 0.75818559 0.83584066 0.75197395 0.73170243 0.82534613 0.82878108] mean value: 0.7955441479582855 key: test_accuracy value: [0.83783784 0.82882883 0.91891892 0.88288288 0.84684685 0.87387387 0.81081081 0.83783784 0.9 0.9 ] mean value: 0.8637837837837838 key: train_accuracy value: [0.90672016 0.87462387 0.91474423 0.90270812 0.87061184 0.91775326 0.87362086 0.85255767 0.91182365 0.91382766] mean value: 0.893899132266539 key: test_fscore value: [0.84482759 0.8 0.91588785 0.8907563 0.83495146 0.87931034 0.7961165 0.86153846 0.8952381 0.90265487] mean value: 0.8621281469221023 key: train_fscore value: [0.91283974 0.86427796 0.91658489 0.90926099 0.85521886 0.91881188 0.86595745 0.86979628 0.90890269 0.91601563] mean value: 0.8937666354668264 key: test_precision value: [0.80327869 0.95 0.94230769 0.828125 0.91489362 0.85 0.87234043 0.75675676 0.94 0.87931034] mean value: 0.8737012524969817 key: train_precision value: [0.85739437 0.94312796 0.89807692 0.85263158 0.96946565 0.90625 0.92081448 0.77812995 0.94004283 0.89333333] mean value: 0.8959267071141968 key: test_recall value: [0.89090909 0.69090909 0.89090909 0.96363636 0.76785714 0.91071429 0.73214286 1. 0.85454545 0.92727273] mean value: 0.8628896103896104 key: train_recall value: [0.9759519 0.79759519 0.93587174 0.9739479 0.76506024 0.93172691 0.81726908 0.98594378 0.87975952 0.93987976] mean value: 0.9003006012024048 key: test_roc_auc value: [0.83831169 0.8275974 0.91866883 0.8836039 0.84756494 0.87353896 0.81152597 0.83636364 0.9 0.9 ] mean value: 0.8637175324675325 key: train_roc_auc value: [0.90665065 0.87470121 0.91472302 0.9026366 0.87050607 0.91776726 0.8735644 0.85269133 0.91182365 0.91382766] mean value: 0.8938891839904708 key: test_jcc value: [0.73134328 0.66666667 0.84482759 0.8030303 0.71666667 0.78461538 0.66129032 0.75675676 0.81034483 0.82258065] mean value: 0.7598122442852906 key: train_jcc value: [0.83965517 0.76099426 0.84601449 0.83361921 0.74705882 0.84981685 0.76360225 0.76959248 0.83301708 0.84504505] mean value: 0.8088415664093777 MCC on Blind test: 0.59 Accuracy on Blind test: 0.84 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03614044 0.03646159 0.03469062 0.0356729 0.0328548 0.03632474 0.04048204 0.03571701 0.04990339 0.0309155 ] mean value: 0.03691630363464356 key: score_time value: [0.01274538 0.01309633 0.01829362 0.02165842 0.01910233 0.02067685 0.01297855 0.01283383 0.01271367 0.01318717] mean value: 0.01572861671447754 key: test_mcc value: [0.65842676 0.80845318 0.82205752 0.72987013 0.69004484 0.73247207 0.33903271 0.62431781 0.78181818 0.56602204] mean value: 0.6752515240726457 key: train_mcc value: [0.85974878 0.82671395 0.83090715 0.82857913 0.71937214 0.81951538 0.31729713 0.73967951 0.85253237 0.69525894] mean value: 0.7489604482361627 key: test_accuracy value: [0.82882883 0.9009009 0.90990991 0.86486486 0.82882883 0.86486486 0.61261261 0.79279279 0.89090909 0.76363636] mean value: 0.8258149058149058 key: train_accuracy value: [0.92978937 0.90972919 0.91273821 0.91273821 0.8445336 0.90972919 0.59779338 0.86058175 0.9258517 0.83366733] mean value: 0.863715193677224 key: test_fscore value: [0.82242991 0.90598291 0.9122807 0.86486486 0.85271318 0.87179487 0.3943662 0.75268817 0.89090909 0.71111111] mean value: 0.7979141000479969 key: train_fscore value: [0.93055556 0.91541353 0.91753555 0.90890052 0.86391572 0.90909091 0.33499171 0.84293785 0.92745098 0.80652681] mean value: 0.8357319130757249 key: test_precision value: [0.84615385 0.85483871 0.88135593 0.85714286 0.75342466 0.83606557 0.93333333 0.94594595 0.89090909 0.91428571] mean value: 0.8713455660956335 key: train_precision value: [0.92141454 0.8619469 0.8705036 0.95175439 0.7675507 0.91463415 0.96190476 0.96382429 0.90786948 0.9637883 ] mean value: 0.9085191106333975 key: test_recall value: [0.8 0.96363636 0.94545455 0.87272727 0.98214286 0.91071429 0.25 0.625 0.89090909 0.58181818] mean value: 0.7822402597402597 key: train_recall value: [0.93987976 0.9759519 0.96993988 0.86973948 0.98795181 0.90361446 0.20281124 0.74899598 0.94789579 0.69338677] mean value: 0.8240167081150253 key: test_roc_auc value: [0.82857143 0.90146104 0.91022727 0.86493506 0.82743506 0.86444805 0.61590909 0.79431818 0.89090909 0.76363636] mean value: 0.8261850649350649 key: train_roc_auc value: [0.92977924 0.9096627 0.91268078 0.91278139 0.84467731 0.90972306 0.59739761 0.86046994 0.9258517 0.83366733] mean value: 0.8636691052788308 key: test_jcc value: [0.6984127 0.828125 0.83870968 0.76190476 0.74324324 0.77272727 0.24561404 0.60344828 0.80327869 0.55172414] mean value: 0.6847187791112744 key: train_jcc value: [0.87012987 0.8440208 0.84763573 0.83301344 0.76043277 0.83333333 0.20119522 0.72851562 0.86471664 0.67578125] mean value: 0.7458774660122005 MCC on Blind test: 0.67 Accuracy on Blind test: 0.87 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.39727902 0.38057208 0.38459516 0.39634442 0.40336704 0.36749411 0.36946654 0.36745024 0.36640286 0.3657105 ] mean value: 0.37986819744110106 key: score_time value: [0.01661921 0.01776004 0.01697946 0.0177021 0.01578593 0.01602459 0.01588058 0.0157671 0.01576114 0.01573563] mean value: 0.01640157699584961 key: test_mcc value: [0.83912942 0.85584416 0.89188312 0.92854828 0.84111937 0.82182846 0.83793444 0.91119237 0.855111 0.9104463 ] mean value: 0.8693036918146279 key: train_mcc value: [0.95598996 0.94994245 0.93796514 0.9439521 0.95192505 0.94599262 0.97006835 0.93608015 0.95217918 0.95005434] mean value: 0.949414934136432 key: test_accuracy value: [0.91891892 0.92792793 0.94594595 0.96396396 0.91891892 0.90990991 0.91891892 0.95495495 0.92727273 0.95454545] mean value: 0.9341277641277641 key: train_accuracy value: [0.9779338 0.97492477 0.96890672 0.97191575 0.97592778 0.97291876 0.98495486 0.96790371 0.9759519 0.9749499 ] mean value: 0.974628796208264 key: test_fscore value: [0.92035398 0.92727273 0.94545455 0.96428571 0.92307692 0.9137931 0.92035398 0.95652174 0.92857143 0.95575221] mean value: 0.93554363582312 key: train_fscore value: [0.97813121 0.97512438 0.96921549 0.972167 0.9760479 0.97313433 0.98507463 0.96825397 0.97623762 0.97517378] mean value: 0.974856031535136 key: test_precision value: [0.89655172 0.92727273 0.94545455 0.94736842 0.8852459 0.88333333 0.9122807 0.93220339 0.9122807 0.93103448] mean value: 0.9173025928988414 key: train_precision value: [0.9704142 0.96837945 0.96062992 0.96449704 0.9702381 0.96449704 0.97633136 0.95686275 0.96477495 0.96653543] mean value: 0.9663160237353894 key: test_recall value: [0.94545455 0.92727273 0.94545455 0.98181818 0.96428571 0.94642857 0.92857143 0.98214286 0.94545455 0.98181818] mean value: 0.9548701298701299 key: train_recall value: [0.98597194 0.98196393 0.97795591 0.97995992 0.98192771 0.98192771 0.9939759 0.97991968 0.98797595 0.98396794] mean value: 0.9835546595198429 key: test_roc_auc value: [0.91915584 0.92792208 0.94594156 0.96412338 0.91850649 0.90957792 0.91883117 0.95470779 0.92727273 0.95454545] mean value: 0.9340584415584415 key: train_roc_auc value: [0.97792573 0.97491771 0.96889763 0.97190767 0.9759338 0.97292778 0.9849639 0.96791575 0.9759519 0.9749499 ] mean value: 0.9746291780347844 key: test_jcc value: [0.85245902 0.86440678 0.89655172 0.93103448 0.85714286 0.84126984 0.85245902 0.91666667 0.86666667 0.91525424] mean value: 0.8793911288378621 key: train_jcc value: [0.95719844 0.95145631 0.94026975 0.94584139 0.95321637 0.94767442 0.97058824 0.93846154 0.95357834 0.95155039] mean value: 0.9509835187210858 MCC on Blind test: 0.81 Accuracy on Blind test: 0.92 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.24564409 0.23522663 0.257339 0.24964762 0.25360632 0.24560642 0.24368262 0.23203063 0.23844266 0.23963594] mean value: 0.2440861940383911 key: score_time value: [0.03826308 0.04138947 0.02316594 0.02756476 0.03848481 0.02812314 0.03311872 0.02815628 0.0273447 0.04294181] mean value: 0.03285527229309082 key: test_mcc value: [0.78376623 0.78859019 0.89242811 0.94735177 0.83897362 0.82027988 0.856354 0.856354 0.89149871 0.92727273] mean value: 0.8602869242483097 key: train_mcc value: [0.99598796 1. 0.99200792 0.98597563 0.99598796 0.99198394 0.99200792 0.98997191 0.99398997 0.99599198] mean value: 0.9933905192511064 key: test_accuracy value: [0.89189189 0.89189189 0.94594595 0.97297297 0.91891892 0.90990991 0.92792793 0.92792793 0.94545455 0.96363636] mean value: 0.9296478296478297 key: train_accuracy value: [0.99799398 1. 0.99598796 0.99297894 0.99799398 0.99598796 0.99598796 0.99498495 0.99699399 0.99799599] mean value: 0.9966905727201645 key: test_fscore value: [0.89090909 0.89655172 0.94444444 0.97345133 0.92173913 0.9122807 0.92982456 0.92982456 0.94642857 0.96363636] mean value: 0.9309090476986216 key: train_fscore value: [0.99799599 1. 0.99597586 0.99300699 0.99799197 0.99599198 0.996 0.99498495 0.996997 0.99799599] mean value: 0.9966940735806726 key: test_precision value: [0.89090909 0.85245902 0.96226415 0.94827586 0.89830508 0.89655172 0.9137931 0.9137931 0.92982456 0.96363636] mean value: 0.9169812061135013 key: train_precision value: [0.99799599 1. 1. 0.99003984 0.99799197 0.994 0.99203187 0.99398798 0.996 0.99799599] mean value: 0.9960043640938736 key: test_recall value: [0.89090909 0.94545455 0.92727273 1. 0.94642857 0.92857143 0.94642857 0.94642857 0.96363636 0.96363636] mean value: 0.9458766233766234 key: train_recall value: [0.99799599 1. 0.99198397 0.99599198 0.99799197 0.99799197 1. 0.99598394 0.99799599 0.99799599] mean value: 0.9973931799341655 key: test_roc_auc value: [0.89188312 0.89237013 0.94577922 0.97321429 0.91866883 0.90974026 0.92775974 0.92775974 0.94545455 0.96363636] mean value: 0.9296266233766234 key: train_roc_auc value: [0.99799398 1. 0.99599198 0.99297591 0.99799398 0.99598997 0.99599198 0.99498596 0.99699399 0.99799599] mean value: 0.9966913747173061 key: test_jcc value: [0.80327869 0.8125 0.89473684 0.94827586 0.85483871 0.83870968 0.86885246 0.86885246 0.89830508 0.92982456] mean value: 0.8718174343977652 key: train_jcc value: [0.996 1. 0.99198397 0.98611111 0.99599198 0.99201597 0.99203187 0.99001996 0.99401198 0.996 ] mean value: 0.9934166839716496 MCC on Blind test: 0.71 Accuracy on Blind test: 0.89 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.55467296 0.51923943 0.50953078 0.58113885 0.50787997 0.54556322 0.46287131 0.50236225 0.59216118 0.5354085 ] mean value: 0.5310828447341919 key: score_time value: [0.04145479 0.02731085 0.04424119 0.04497147 0.0242722 0.04455423 0.04341412 0.03534317 0.04339361 0.04950762] mean value: 0.0398463249206543 key: test_mcc value: [0.66058982 0.71955846 0.73090707 0.79230071 0.66254427 0.73247207 0.71884134 0.78818464 0.8187233 0.80119274] mean value: 0.7425314425935436 key: train_mcc value: [0.95007973 0.95796744 0.96802783 0.95792134 0.95796862 0.95593733 0.95994962 0.95606082 0.96004323 0.9580101 ] mean value: 0.9581966061111651 key: test_accuracy value: [0.82882883 0.85585586 0.86486486 0.89189189 0.82882883 0.86486486 0.85585586 0.89189189 0.90909091 0.9 ] mean value: 0.8691973791973792 key: train_accuracy value: [0.97492477 0.97893681 0.98395186 0.97893681 0.97893681 0.9779338 0.97993982 0.9779338 0.97995992 0.97895792] mean value: 0.9790412319121694 key: test_fscore value: [0.83478261 0.86440678 0.86725664 0.89830508 0.84033613 0.87179487 0.86666667 0.89830508 0.91071429 0.89719626] mean value: 0.8749764415328185 key: train_fscore value: [0.97522299 0.97910448 0.98409543 0.97906281 0.97906281 0.97804391 0.98003992 0.97813121 0.98011928 0.97910448] mean value: 0.9791987328205536 key: test_precision value: [0.8 0.80952381 0.84482759 0.84126984 0.79365079 0.83606557 0.8125 0.85483871 0.89473684 0.92307692] mean value: 0.8410490079281439 key: train_precision value: [0.96470588 0.97233202 0.97633136 0.97420635 0.97227723 0.97222222 0.97420635 0.96850394 0.97238659 0.97233202] mean value: 0.971950394805701 key: test_recall value: [0.87272727 0.92727273 0.89090909 0.96363636 0.89285714 0.91071429 0.92857143 0.94642857 0.92727273 0.87272727] mean value: 0.9133116883116883 key: train_recall value: [0.98597194 0.98597194 0.99198397 0.98396794 0.98594378 0.98393574 0.98594378 0.98795181 0.98797595 0.98597194] mean value: 0.9865618787776356 key: test_roc_auc value: [0.82922078 0.85649351 0.8650974 0.89253247 0.82824675 0.86444805 0.85519481 0.8913961 0.90909091 0.9 ] mean value: 0.8691720779220778 key: train_roc_auc value: [0.97491368 0.97892975 0.98394379 0.97893176 0.97894383 0.97793982 0.97994584 0.97794384 0.97995992 0.97895792] mean value: 0.9790410137544164 key: test_jcc value: [0.71641791 0.76119403 0.765625 0.81538462 0.72463768 0.77272727 0.76470588 0.81538462 0.83606557 0.81355932] mean value: 0.7785701903111762 key: train_jcc value: [0.9516441 0.95906433 0.96868885 0.95898438 0.95898438 0.95703125 0.96086106 0.95719844 0.96101365 0.95906433] mean value: 0.959253474650761 MCC on Blind test: 0.54 Accuracy on Blind test: 0.81 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [1.71389413 1.69086385 1.72496676 1.6965642 1.69422984 1.68370032 1.69756031 1.69427085 1.6981957 1.68466806] mean value: 1.697891402244568 key: score_time value: [0.01014042 0.00983286 0.00966883 0.00983334 0.01002479 0.00973153 0.00976062 0.00952935 0.01008153 0.00966859] mean value: 0.009827184677124023 key: test_mcc value: [0.85816689 0.86102173 0.87398511 0.93038564 0.88077101 0.85798501 0.83793444 0.91355091 0.87402845 0.92973479] mean value: 0.8817563969991022 key: train_mcc value: [0.99200779 0.99198387 0.98799559 0.98998777 0.98402331 0.99200792 0.99200792 0.98803559 0.99201584 0.99400594] mean value: 0.9904071544165598 key: test_accuracy value: [0.92792793 0.92792793 0.93693694 0.96396396 0.93693694 0.92792793 0.91891892 0.95495495 0.93636364 0.96363636] mean value: 0.9395495495495495 key: train_accuracy value: [0.99598796 0.99598796 0.99398195 0.99498495 0.99197593 0.99598796 0.99598796 0.99398195 0.99599198 0.99699399] mean value: 0.9951862601833557 key: test_fscore value: [0.92982456 0.93103448 0.93577982 0.96491228 0.94117647 0.93103448 0.92035398 0.95726496 0.9380531 0.96491228] mean value: 0.9414346412337231 key: train_fscore value: [0.99600798 0.996 0.99401198 0.995005 0.99201597 0.996 0.996 0.99401198 0.99600798 0.997003 ] mean value: 0.9952063880231545 key: test_precision value: [0.89830508 0.8852459 0.94444444 0.93220339 0.88888889 0.9 0.9122807 0.91803279 0.9137931 0.93220339] mean value: 0.9125397691467365 key: train_precision value: [0.99204771 0.99401198 0.99005964 0.99203187 0.98611111 0.99203187 0.99203187 0.98809524 0.99204771 0.9940239 ] mean value: 0.9912492916749109 key: test_recall value: [0.96363636 0.98181818 0.92727273 1. 1. 0.96428571 0.92857143 1. 0.96363636 1. ] mean value: 0.972922077922078 key: train_recall value: [1. 0.99799599 0.99799599 0.99799599 0.99799197 1. 1. 1. 1. 1. ] mean value: 0.999197994382339 key: test_roc_auc value: [0.92824675 0.92840909 0.93685065 0.96428571 0.93636364 0.9275974 0.91883117 0.95454545 0.93636364 0.96363636] mean value: 0.939512987012987 key: train_roc_auc value: [0.99598394 0.99598595 0.99397792 0.99498193 0.99198196 0.99599198 0.99599198 0.99398798 0.99599198 0.99699399] mean value: 0.9951869602659134 key: test_jcc value: [0.86885246 0.87096774 0.87931034 0.93220339 0.88888889 0.87096774 0.85245902 0.91803279 0.88333333 0.93220339] mean value: 0.8897219092876875 key: train_jcc value: [0.99204771 0.99203187 0.98809524 0.99005964 0.98415842 0.99203187 0.99203187 0.98809524 0.99204771 0.9940239 ] mean value: 0.9904623483526916 MCC on Blind test: 0.77 Accuracy on Blind test: 0.91 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.04374695 0.06547046 0.04631853 0.04311037 0.04469585 0.04323721 0.04338408 0.04668546 0.04348207 0.04395962] mean value: 0.04640905857086182 key: score_time value: [0.01359057 0.01312923 0.01343632 0.01324058 0.01333547 0.01309013 0.01344275 0.01334 0.01368546 0.01321197] mean value: 0.013350248336791992 key: test_mcc value: [0.47871355 0.3222257 0.33903271 0.35540963 0.25544091 0.41410537 0.41410537 0.47304992 0.30976699 0.45693677] mean value: 0.38187869064725444 key: train_mcc value: [0.44253373 0.43112172 0.44739295 0.41128449 0.40377368 0.40879303 0.43189143 0.40544978 0.39041637 0.43206773] mean value: 0.42047249016488764 key: test_accuracy value: [0.68468468 0.6036036 0.61261261 0.62162162 0.58558559 0.64864865 0.64864865 0.68468468 0.6 0.67272727] mean value: 0.6362817362817363 key: train_accuracy value: [0.66399198 0.65697091 0.667001 0.6449348 0.63991976 0.64292879 0.65697091 0.64092277 0.63226453 0.65731463] mean value: 0.6503220081084938 key: test_fscore value: [0.75862069 0.71052632 0.71523179 0.72 0.7012987 0.74172185 0.74172185 0.76190476 0.71052632 0.75342466] mean value: 0.7314976938660571 key: train_fscore value: [0.74868717 0.74477612 0.75037594 0.73816568 0.73505535 0.73668639 0.74439462 0.73559823 0.73113553 0.74477612] mean value: 0.7409651149451728 key: test_precision value: [0.61111111 0.55670103 0.5625 0.56842105 0.55102041 0.58947368 0.58947368 0.61538462 0.55670103 0.6043956 ] mean value: 0.5805182221962898 key: train_precision value: [0.59832134 0.59334126 0.60048135 0.58499414 0.58109685 0.58313817 0.59285714 0.5817757 0.57621247 0.59334126] mean value: 0.5885559687543657 key: test_recall value: [1. 0.98181818 0.98181818 0.98181818 0.96428571 1. 1. 1. 0.98181818 1. ] mean value: 0.9891558441558441 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.6875 0.60698052 0.61590909 0.62483766 0.58214286 0.64545455 0.64545455 0.68181818 0.6 0.67272727] mean value: 0.6362824675324675 key: train_roc_auc value: [0.66365462 0.65662651 0.66666667 0.64457831 0.64028056 0.64328657 0.65731463 0.64128257 0.63226453 0.65731463] mean value: 0.6503269591391618 key: test_jcc value: [0.61111111 0.55102041 0.55670103 0.5625 0.54 0.58947368 0.58947368 0.61538462 0.55102041 0.6043956 ] mean value: 0.5771080546566749 key: train_jcc value: [0.59832134 0.59334126 0.60048135 0.58499414 0.58109685 0.58313817 0.59285714 0.5817757 0.57621247 0.59334126] mean value: 0.5885559687543657 MCC on Blind test: 0.14 Accuracy on Blind test: 0.43 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02749586 0.04280066 0.02907753 0.03794432 0.03937221 0.03592968 0.04454541 0.04316807 0.04792213 0.0542562 ] mean value: 0.04025120735168457 key: score_time value: [0.01947474 0.02240396 0.02471495 0.02588439 0.02503633 0.01949716 0.01938891 0.01940846 0.01542902 0.02859068] mean value: 0.021982860565185548 key: test_mcc value: [0.66058982 0.78434561 0.81980519 0.80845318 0.80802876 0.73912573 0.7306455 0.78087736 0.78651226 0.85681441] mean value: 0.7775197836320717 key: train_mcc value: [0.83473792 0.82049509 0.81956135 0.81892375 0.82857913 0.82239706 0.819597 0.81679117 0.82443086 0.8253123 ] mean value: 0.8230825643597578 key: test_accuracy value: [0.82882883 0.89189189 0.90990991 0.9009009 0.9009009 0.86486486 0.86486486 0.88288288 0.89090909 0.92727273] mean value: 0.8863226863226863 key: train_accuracy value: [0.91574724 0.90872618 0.90772317 0.90772317 0.91273821 0.90972919 0.90772317 0.90672016 0.91082164 0.91082164] mean value: 0.909847377804757 key: test_fscore value: [0.83478261 0.89285714 0.90909091 0.90598291 0.90756303 0.87603306 0.86956522 0.89430894 0.89655172 0.92982456] mean value: 0.8916560095710109 key: train_fscore value: [0.9193858 0.91258405 0.91221374 0.91187739 0.91626564 0.9132948 0.91204589 0.91066282 0.91434071 0.91483254] mean value: 0.9137503384577215 key: test_precision value: [0.8 0.87719298 0.90909091 0.85483871 0.85714286 0.81538462 0.84745763 0.82089552 0.85245902 0.89830508] mean value: 0.8532767324397851 key: train_precision value: [0.88213628 0.87638376 0.87067395 0.8733945 0.87985213 0.87777778 0.87043796 0.87292818 0.87962963 0.87545788] mean value: 0.8758672033376387 key: test_recall value: [0.87272727 0.90909091 0.90909091 0.96363636 0.96428571 0.94642857 0.89285714 0.98214286 0.94545455 0.96363636] mean value: 0.934935064935065 key: train_recall value: [0.95991984 0.95190381 0.95791583 0.95390782 0.95582329 0.95180723 0.95783133 0.95180723 0.95190381 0.95791583] mean value: 0.9550736010172955 key: test_roc_auc value: [0.82922078 0.89204545 0.9099026 0.90146104 0.90032468 0.86412338 0.86461039 0.88198052 0.89090909 0.92727273] mean value: 0.8861850649350649 key: train_roc_auc value: [0.91570289 0.90868283 0.90767278 0.9076768 0.91278139 0.90977135 0.90777338 0.90676534 0.91082164 0.91082164] mean value: 0.9098470032434346 key: test_jcc value: [0.71641791 0.80645161 0.83333333 0.828125 0.83076923 0.77941176 0.76923077 0.80882353 0.8125 0.86885246] mean value: 0.8053915609818361 key: train_jcc value: [0.85079929 0.83922261 0.83859649 0.83802817 0.84547069 0.84042553 0.83831283 0.83597884 0.84219858 0.84303351] mean value: 0.8412066546000827 MCC on Blind test: 0.61 Accuracy on Blind test: 0.83 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.31536579 0.30559421 0.21722341 0.35626459 0.46322298 0.34366488 0.35091496 0.33481693 0.30926394 0.29198647] mean value: 0.3288318157196045 key: score_time value: [0.01272392 0.01928926 0.01933956 0.02210021 0.01939154 0.01926899 0.01933599 0.02292752 0.0124402 0.02368855] mean value: 0.01905057430267334 key: test_mcc value: [0.66058982 0.76698119 0.78434561 0.80845318 0.80802876 0.75530907 0.74951538 0.78087736 0.78651226 0.85681441] mean value: 0.7757427039457268 key: train_mcc value: [0.83473792 0.82940212 0.83847423 0.83819631 0.82857913 0.83936409 0.838222 0.81679117 0.82443086 0.8253123 ] mean value: 0.8313510135109393 key: test_accuracy value: [0.82882883 0.88288288 0.89189189 0.9009009 0.9009009 0.87387387 0.87387387 0.88288288 0.89090909 0.92727273] mean value: 0.8854217854217854 key: train_accuracy value: [0.91574724 0.91374122 0.91775326 0.91775326 0.91273821 0.91875627 0.91775326 0.90672016 0.91082164 0.91082164] mean value: 0.9142606175239144 key: test_fscore value: [0.83478261 0.88495575 0.89285714 0.90598291 0.90756303 0.88333333 0.87931034 0.89430894 0.89655172 0.92982456] mean value: 0.8909470341749964 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:136: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:139: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.9193858 0.91666667 0.92115385 0.92100193 0.91626564 0.9212828 0.92084942 0.91066282 0.91434071 0.91483254] mean value: 0.9176442168185582 key: test_precision value: [0.8 0.86206897 0.87719298 0.85483871 0.85714286 0.828125 0.85 0.82089552 0.85245902 0.89830508] mean value: 0.8501028138320923 key: train_precision value: [0.88213628 0.88742964 0.88539741 0.88682746 0.87985213 0.89265537 0.8866171 0.87292818 0.87962963 0.87545788] mean value: 0.8828931069088831 key: test_recall value: [0.87272727 0.90909091 0.90909091 0.96363636 0.96428571 0.94642857 0.91071429 0.98214286 0.94545455 0.96363636] mean value: 0.9367207792207792 key: train_recall value: [0.95991984 0.94789579 0.95991984 0.95791583 0.95582329 0.95180723 0.95783133 0.95180723 0.95190381 0.95791583] mean value: 0.9552740018188988 key: test_roc_auc value: [0.82922078 0.88311688 0.89204545 0.90146104 0.90032468 0.87321429 0.87353896 0.88198052 0.89090909 0.92727273] mean value: 0.8853084415584416 key: train_roc_auc value: [0.91570289 0.91370693 0.91771092 0.91771294 0.91278139 0.91878939 0.91779342 0.90676534 0.91082164 0.91082164] mean value: 0.9142606498136836 key: test_jcc value: [0.71641791 0.79365079 0.80645161 0.828125 0.83076923 0.79104478 0.78461538 0.80882353 0.8125 0.86885246] mean value: 0.8041250696933957 key: train_jcc value: [0.85079929 0.84615385 0.85383244 0.85357143 0.84547069 0.85405405 0.85330948 0.83597884 0.84219858 0.84303351] mean value: 0.847840216154083 MCC on Blind test: 0.62 Accuracy on Blind test: 0.84 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04093432 0.04260874 0.05253553 0.05427361 0.04360962 0.04467535 0.04421616 0.05421448 0.04414415 0.04397631] mean value: 0.04651882648468018 key: score_time value: [0.01233125 0.0134294 0.01342249 0.01335311 0.0134294 0.01324058 0.01339912 0.01341963 0.01598525 0.01339984] mean value: 0.013541007041931152 key: test_mcc value: [0.60357143 0.82205752 0.72978244 0.69959151 0.78818464 0.66597107 0.73247207 0.69891539 0.80332642 0.7823356 ] mean value: 0.7326208073785166 key: train_mcc value: [0.78495504 0.77672016 0.79042861 0.78136227 0.77254608 0.79725623 0.77694243 0.78307581 0.77994086 0.76055213] mean value: 0.7803779608362683 key: test_accuracy value: [0.8018018 0.90990991 0.86486486 0.84684685 0.89189189 0.82882883 0.86486486 0.84684685 0.9 0.89090909] mean value: 0.8646764946764947 key: train_accuracy value: [0.89167503 0.88766299 0.89368104 0.88966901 0.88565697 0.89769308 0.88766299 0.89067202 0.88877756 0.87975952] mean value: 0.889291019350637 key: test_fscore value: [0.8 0.9122807 0.86238532 0.85470085 0.89830508 0.84297521 0.87179487 0.85714286 0.90434783 0.89285714] mean value: 0.8696789866795319 key: train_fscore value: [0.89514563 0.89105058 0.89827255 0.89361702 0.88867188 0.90097087 0.89105058 0.89407191 0.89296046 0.8828125 ] mean value: 0.8928623998583001 key: test_precision value: [0.8 0.88135593 0.87037037 0.80645161 0.85483871 0.78461538 0.83606557 0.80952381 0.86666667 0.87719298] mean value: 0.8387081042186898 key: train_precision value: [0.86817326 0.8657845 0.86187845 0.8635514 0.86501901 0.87218045 0.86415094 0.86629002 0.8605948 0.86095238] mean value: 0.8648575213221116 key: test_recall value: [0.8 0.94545455 0.85454545 0.90909091 0.94642857 0.91071429 0.91071429 0.91071429 0.94545455 0.90909091] mean value: 0.9042207792207793 key: train_recall value: [0.9238477 0.91783567 0.93787575 0.9258517 0.91365462 0.93172691 0.91967871 0.92369478 0.92785571 0.90581162] mean value: 0.9227833176392947 key: test_roc_auc value: [0.80178571 0.91022727 0.86477273 0.8474026 0.8913961 0.82808442 0.86444805 0.84626623 0.9 0.89090909] mean value: 0.8645292207792208 key: train_roc_auc value: [0.89164272 0.8876327 0.89363667 0.88963268 0.88568502 0.89772718 0.88769507 0.8907051 0.88877756 0.87975952] mean value: 0.8892894222179298 key: test_jcc value: [0.66666667 0.83870968 0.75806452 0.74626866 0.81538462 0.72857143 0.77272727 0.75 0.82539683 0.80645161] mean value: 0.770824127191484 key: train_jcc value: [0.81019332 0.80350877 0.81533101 0.80769231 0.79964851 0.81978799 0.80350877 0.80843585 0.80662021 0.79020979] mean value: 0.8064936527280264 MCC on Blind test: 0.64 Accuracy on Blind test: 0.84 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.15100551 1.03059697 1.11383557 1.17403126 1.12681389 1.05125737 1.17251849 1.03575873 1.04927802 1.27532935] mean value: 1.118042516708374 key: score_time value: [0.01579118 0.01581717 0.01901102 0.01574683 0.01594114 0.01595163 0.02020645 0.01380491 0.01625299 0.01591539] mean value: 0.016443872451782228 key: test_mcc value: [0.62175325 0.87402597 0.82182846 0.69959151 0.8049036 0.73247207 0.82027988 0.75979502 0.69102332 0.7499303 ] mean value: 0.7575603381593177 key: train_mcc value: [0.81462589 0.87037589 0.85431279 0.85417564 0.84386768 0.86027684 0.84810102 0.86845461 0.8723112 0.87219196] mean value: 0.8558693517884952 key: test_accuracy value: [0.81081081 0.93693694 0.90990991 0.84684685 0.9009009 0.86486486 0.90990991 0.87387387 0.84545455 0.87272727] mean value: 0.8772235872235872 key: train_accuracy value: [0.90672016 0.93480441 0.92678034 0.92678034 0.9217653 0.92978937 0.92377131 0.9338014 0.93587174 0.93587174] mean value: 0.9275956124887689 key: test_fscore value: [0.81081081 0.93693694 0.90566038 0.85470085 0.90598291 0.87179487 0.9122807 0.8852459 0.84684685 0.87931034] mean value: 0.8809570552653033 key: train_fscore value: [0.90926829 0.93621197 0.92836114 0.92822026 0.92277228 0.93110236 0.92504931 0.93516699 0.93700787 0.93688363] mean value: 0.9290044105640144 key: test_precision value: [0.80357143 0.92857143 0.94117647 0.80645161 0.86885246 0.83606557 0.89655172 0.81818182 0.83928571 0.83606557] mean value: 0.8574773803797159 key: train_precision value: [0.88593156 0.91730769 0.90961538 0.91119691 0.91015625 0.91312741 0.90891473 0.91538462 0.92069632 0.9223301 ] mean value: 0.9114660976288571 key: test_recall value: [0.81818182 0.94545455 0.87272727 0.90909091 0.94642857 0.91071429 0.92857143 0.96428571 0.85454545 0.92727273] mean value: 0.9077272727272727 key: train_recall value: [0.93386774 0.95591182 0.94789579 0.94589178 0.93574297 0.9497992 0.94176707 0.95582329 0.95390782 0.95190381] mean value: 0.9472511287635512 key: test_roc_auc value: [0.81087662 0.93701299 0.90957792 0.8474026 0.90048701 0.86444805 0.90974026 0.87305195 0.84545455 0.87272727] mean value: 0.877077922077922 key: train_roc_auc value: [0.9066929 0.93478322 0.92675914 0.92676115 0.9217793 0.92980942 0.92378935 0.93382347 0.93587174 0.93587174] mean value: 0.9275941441115163 key: test_jcc value: [0.68181818 0.88135593 0.82758621 0.74626866 0.828125 0.77272727 0.83870968 0.79411765 0.734375 0.78461538] mean value: 0.7889698959455377 key: train_jcc value: [0.83363148 0.8800738 0.86630037 0.86605505 0.85661765 0.87108656 0.86055046 0.87822878 0.88148148 0.8812616 ] mean value: 0.8675287218964672 MCC on Blind test: 0.63 Accuracy on Blind test: 0.85 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01754236 0.01333523 0.01295114 0.01299715 0.01282549 0.013062 0.01278353 0.01383567 0.01366758 0.01300454] mean value: 0.013600468635559082 key: score_time value: [0.01324677 0.00990438 0.00989175 0.00963712 0.01019049 0.01029158 0.01033616 0.00978732 0.01000738 0.01002502] mean value: 0.01033179759979248 key: test_mcc value: [0.51517746 0.4775974 0.46159963 0.49641957 0.67720229 0.51815539 0.51398927 0.41165822 0.56363636 0.6401844 ] mean value: 0.527562000397169 key: train_mcc value: [0.53949298 0.54988148 0.54586088 0.56805192 0.55221636 0.54760475 0.5415118 0.5415118 0.55760823 0.53774946] mean value: 0.548148963922759 key: test_accuracy value: [0.75675676 0.73873874 0.72972973 0.74774775 0.83783784 0.75675676 0.75675676 0.7027027 0.78181818 0.81818182] mean value: 0.7627027027027027 key: train_accuracy value: [0.76930792 0.77432297 0.77231695 0.78335005 0.77432297 0.77331996 0.77031093 0.77031093 0.77855711 0.76853707] mean value: 0.7734656876440946 key: test_fscore value: [0.74285714 0.73873874 0.71153846 0.73584906 0.84482759 0.74285714 0.76521739 0.67961165 0.78181818 0.80769231] mean value: 0.755100766010243 key: train_fscore value: [0.7628866 0.76683938 0.76476684 0.77593361 0.76038339 0.76604555 0.76318511 0.76318511 0.77379734 0.76258993] mean value: 0.7659612844765213 key: test_precision value: [0.78 0.73214286 0.75510204 0.76470588 0.81666667 0.79591837 0.74576271 0.74468085 0.78181818 0.85714286] mean value: 0.7773940416215006 key: train_precision value: [0.78556263 0.79399142 0.79184549 0.80430108 0.80952381 0.79059829 0.78678038 0.78678038 0.79079498 0.78270042] mean value: 0.7922878886569598 key: test_recall value: [0.70909091 0.74545455 0.67272727 0.70909091 0.875 0.69642857 0.78571429 0.625 0.78181818 0.76363636] mean value: 0.7363961038961039 key: train_recall value: [0.74148297 0.74148297 0.73947896 0.749499 0.71686747 0.74297189 0.74096386 0.74096386 0.75751503 0.74348697] mean value: 0.741471296005666 key: test_roc_auc value: [0.75633117 0.7387987 0.72922078 0.7474026 0.8375 0.75730519 0.75649351 0.70340909 0.78181818 0.81818182] mean value: 0.7626461038961039 key: train_roc_auc value: [0.76933586 0.77435594 0.77234992 0.78338404 0.7742654 0.77328955 0.77028153 0.77028153 0.77855711 0.76853707] mean value: 0.7734637950599995 key: test_jcc value: [0.59090909 0.58571429 0.55223881 0.58208955 0.73134328 0.59090909 0.61971831 0.51470588 0.64179104 0.67741935] mean value: 0.6086838701150438 key: train_jcc value: [0.61666667 0.62184874 0.61912752 0.63389831 0.61340206 0.62080537 0.61705686 0.61705686 0.63105175 0.61627907] mean value: 0.6207193194072481 MCC on Blind test: 0.44 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01254296 0.01719975 0.01711202 0.01715064 0.01714396 0.01716685 0.0172646 0.01705408 0.01714993 0.01713777] mean value: 0.016692256927490233 key: score_time value: [0.01235557 0.01248646 0.01242685 0.01279688 0.01244903 0.01248336 0.01246428 0.01228619 0.01236367 0.01239777] mean value: 0.01245100498199463 key: test_mcc value: [0.53168696 0.53168696 0.5365027 0.60383519 0.64303575 0.6048892 0.65842676 0.60383519 0.71097366 0.62075223] mean value: 0.6045624604926498 key: train_mcc value: [0.62690345 0.63122939 0.61741439 0.64894709 0.632912 0.6491123 0.62326104 0.64713651 0.64529188 0.60949387] mean value: 0.6331701922310734 key: test_accuracy value: [0.76576577 0.76576577 0.76576577 0.8018018 0.81981982 0.8018018 0.82882883 0.8018018 0.85454545 0.80909091] mean value: 0.8014987714987715 key: train_accuracy value: [0.81344032 0.81544634 0.80842528 0.82447342 0.81644935 0.82447342 0.8114343 0.82347041 0.82264529 0.80460922] mean value: 0.8164867347533582 key: test_fscore value: [0.75925926 0.75925926 0.74509804 0.7962963 0.83050847 0.81034483 0.83478261 0.80701754 0.85964912 0.8 ] mean value: 0.8002215431555298 key: train_fscore value: [0.81287726 0.81262729 0.80450358 0.82482482 0.81681682 0.82621648 0.80777096 0.82539683 0.8224674 0.80162767] mean value: 0.815512912261371 key: test_precision value: [0.77358491 0.77358491 0.80851064 0.81132075 0.79032258 0.78333333 0.81355932 0.79310345 0.83050847 0.84 ] mean value: 0.8017828363200135 key: train_precision value: [0.81616162 0.82608696 0.82217573 0.824 0.81437126 0.8172888 0.82291667 0.81568627 0.82329317 0.81404959] mean value: 0.819603006460176 key: test_recall value: [0.74545455 0.74545455 0.69090909 0.78181818 0.875 0.83928571 0.85714286 0.82142857 0.89090909 0.76363636] mean value: 0.8011038961038961 key: train_recall value: [0.80961924 0.7995992 0.78757515 0.8256513 0.81927711 0.83534137 0.79317269 0.83534137 0.82164329 0.78957916] mean value: 0.8116799864789821 key: test_roc_auc value: [0.76558442 0.76558442 0.7650974 0.80162338 0.81931818 0.80146104 0.82857143 0.80162338 0.85454545 0.80909091] mean value: 0.80125 key: train_roc_auc value: [0.81344416 0.81546225 0.80844621 0.82447224 0.81645218 0.82448431 0.811416 0.82348231 0.82264529 0.80460922] mean value: 0.8164914165680759 key: test_jcc value: [0.6119403 0.6119403 0.59375 0.66153846 0.71014493 0.68115942 0.71641791 0.67647059 0.75384615 0.66666667] mean value: 0.668387472557535 key: train_jcc value: [0.68474576 0.68439108 0.67294521 0.70187394 0.69035533 0.70389171 0.67753002 0.7027027 0.69846678 0.66893039] mean value: 0.6885832913576179 MCC on Blind test: 0.5 Accuracy on Blind test: 0.79 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01590347 0.01198936 0.01165318 0.01147985 0.01264477 0.01251578 0.01235557 0.01222801 0.01148844 0.01146007] mean value: 0.01237185001373291 key: score_time value: [0.04040146 0.0159626 0.0156765 0.01903772 0.01762676 0.01922369 0.01665974 0.01898384 0.01707697 0.02237201] mean value: 0.020302128791809083 key: test_mcc value: [0.70340005 0.47838827 0.57407396 0.48186817 0.6048892 0.54348795 0.69757747 0.4778799 0.62325024 0.56400939] mean value: 0.5748824590071774 key: train_mcc value: [0.73890566 0.71341062 0.73549304 0.76812618 0.73773179 0.74176802 0.75322467 0.70564267 0.73393064 0.72957779] mean value: 0.7357811061752336 key: test_accuracy value: [0.84684685 0.73873874 0.78378378 0.73873874 0.8018018 0.76576577 0.83783784 0.73873874 0.80909091 0.78181818] mean value: 0.7843161343161343 key: train_accuracy value: [0.86760281 0.8555667 0.86559679 0.88164493 0.8665998 0.86860582 0.87462387 0.85155466 0.86472946 0.86272545] mean value: 0.8659250295978115 key: test_fscore value: [0.85714286 0.74336283 0.79661017 0.75213675 0.81034483 0.79032258 0.85714286 0.74782609 0.82051282 0.78571429] mean value: 0.7961116069187395 key: train_fscore value: [0.8740458 0.86127168 0.87262357 0.88804554 0.87345385 0.8753568 0.88061127 0.85741811 0.87179487 0.86964795] mean value: 0.8724269457459887 key: test_precision value: [0.796875 0.72413793 0.74603175 0.70967742 0.78333333 0.72058824 0.77142857 0.72881356 0.77419355 0.77192982] mean value: 0.7527009168747624 key: train_precision value: [0.83424408 0.82931354 0.83001808 0.84324324 0.83001808 0.8318264 0.83970856 0.82407407 0.82851986 0.82789855] mean value: 0.8318864476214571 key: test_recall value: [0.92727273 0.76363636 0.85454545 0.8 0.83928571 0.875 0.96428571 0.76785714 0.87272727 0.8 ] mean value: 0.846461038961039 key: train_recall value: [0.91783567 0.89579158 0.91983968 0.93787575 0.92168675 0.92369478 0.92570281 0.8935743 0.91983968 0.91583166] mean value: 0.9171672662594265 key: test_roc_auc value: [0.84756494 0.73896104 0.78441558 0.73928571 0.80146104 0.76477273 0.83668831 0.73847403 0.80909091 0.78181818] mean value: 0.7842532467532468 key: train_roc_auc value: [0.86755237 0.85552631 0.86554233 0.88158848 0.866655 0.86866102 0.87467505 0.85159677 0.86472946 0.86272545] mean value: 0.8659252239418596 key: test_jcc value: [0.75 0.5915493 0.66197183 0.60273973 0.68115942 0.65333333 0.75 0.59722222 0.69565217 0.64705882] mean value: 0.6630686826075827 key: train_jcc value: [0.77627119 0.75634518 0.77403035 0.79863481 0.77533784 0.77834179 0.78668942 0.75042159 0.77272727 0.76936027] mean value: 0.7738159708974901 MCC on Blind test: 0.43 Accuracy on Blind test: 0.74 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.05335617 0.05310178 0.0520792 0.05209589 0.06355214 0.05141449 0.06059766 0.06146669 0.06533003 0.06051207] mean value: 0.05735061168670654 key: score_time value: [0.01984859 0.01980829 0.01976943 0.01942945 0.02001691 0.01954556 0.02005029 0.02004004 0.02193451 0.01983261] mean value: 0.020027565956115722 key: test_mcc value: [0.61044509 0.77216596 0.67619361 0.66058982 0.82447186 0.5928164 0.71335883 0.64590588 0.73720978 0.82035423] mean value: 0.7053511460043415 key: train_mcc value: [0.75534914 0.75786007 0.75012721 0.76480992 0.74430166 0.75943336 0.74082414 0.7473325 0.72307793 0.74636831] mean value: 0.7489484245836903 key: test_accuracy value: [0.8018018 0.88288288 0.83783784 0.82882883 0.90990991 0.79279279 0.85585586 0.81981982 0.86363636 0.90909091] mean value: 0.8502457002457002 key: train_accuracy value: [0.87562688 0.87662989 0.87261785 0.87963892 0.86960883 0.8776329 0.86860582 0.87161484 0.85971944 0.87074148] mean value: 0.8722436849627038 key: test_fscore value: [0.81355932 0.88888889 0.83928571 0.83478261 0.91525424 0.80991736 0.86206897 0.83333333 0.87394958 0.9122807 ] mean value: 0.8583320707001083 key: train_fscore value: [0.88190476 0.88319088 0.87962085 0.88657845 0.87666034 0.88358779 0.87464115 0.8778626 0.86641221 0.87772512] mean value: 0.8788184151866292 key: test_precision value: [0.76190476 0.83870968 0.8245614 0.8 0.87096774 0.75384615 0.83333333 0.78125 0.8125 0.88135593] mean value: 0.815842900415125 key: train_precision value: [0.84029038 0.83935018 0.83453237 0.83899821 0.83093525 0.84181818 0.83546618 0.83636364 0.82695811 0.83273381] mean value: 0.8357446314558294 key: test_recall value: [0.87272727 0.94545455 0.85454545 0.87272727 0.96428571 0.875 0.89285714 0.89285714 0.94545455 0.94545455] mean value: 0.9061363636363636 key: train_recall value: [0.92785571 0.93186373 0.92985972 0.93987976 0.92771084 0.92971888 0.91767068 0.92369478 0.90981964 0.92785571] mean value: 0.9265929449259966 key: test_roc_auc value: [0.80243506 0.88344156 0.83798701 0.82922078 0.90941558 0.79204545 0.85551948 0.81915584 0.86363636 0.90909091] mean value: 0.8501948051948052 key: train_roc_auc value: [0.87557444 0.87657443 0.87256038 0.87957843 0.86966704 0.87768509 0.86865498 0.87166703 0.85971944 0.87074148] mean value: 0.8722422757160908 key: test_jcc value: [0.68571429 0.8 0.72307692 0.71641791 0.84375 0.68055556 0.75757576 0.71428571 0.7761194 0.83870968] mean value: 0.7536205227060427 key: train_jcc value: [0.78875639 0.79081633 0.78510998 0.79626486 0.78040541 0.79145299 0.77721088 0.78231293 0.76430976 0.78209459] mean value: 0.7838734118999983 MCC on Blind test: 0.63 Accuracy on Blind test: 0.83 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [3.75282788 3.81552005 3.63571382 3.76962566 3.72686815 3.62004423 3.12245107 3.29060173 3.8202343 3.61941886] mean value: 3.6173305749893188 key: score_time value: [0.01521969 0.01497841 0.0150671 0.01494384 0.01501822 0.01528382 0.01300669 0.0128808 0.01538062 0.01552033] mean value: 0.014729952812194825 key: test_mcc value: [0.86504296 0.85816689 0.89249761 0.82205752 0.89704631 0.82447186 0.84111937 0.82824452 0.92973479 0.87402845] mean value: 0.863241028464841 key: train_mcc value: [0.99599599 0.99799598 0.9939999 0.99598796 0.99399998 0.99598796 0.99398395 0.97621121 0.997998 0.99400594] mean value: 0.993616687465768 key: test_accuracy value: [0.92792793 0.92792793 0.94594595 0.90990991 0.94594595 0.90990991 0.91891892 0.90990991 0.96363636 0.93636364] mean value: 0.9296396396396397 key: train_accuracy value: [0.99799398 0.99899699 0.99699097 0.99799398 0.99699097 0.99799398 0.99699097 0.98796389 0.998998 0.99699399] mean value: 0.9967907731209661 key: test_fscore value: [0.93220339 0.92982456 0.94642857 0.9122807 0.94915254 0.91525424 0.92307692 0.91666667 0.96491228 0.9380531 ] mean value: 0.9327852971868469 key: train_fscore value: [0.99799197 0.998999 0.997003 0.99799599 0.996997 0.99799197 0.99699097 0.98809524 0.998999 0.997003 ] mean value: 0.9968067127741923 key: test_precision value: [0.87301587 0.89830508 0.92982456 0.88135593 0.90322581 0.87096774 0.8852459 0.859375 0.93220339 0.9137931 ] mean value: 0.894731239467376 key: train_precision value: [1. 0.998 0.9940239 0.99799599 0.99401198 0.99799197 0.99599198 0.97647059 0.998 0.9940239 ] mean value: 0.9946510316871529 key: test_recall value: [1. 0.96363636 0.96363636 0.94545455 1. 0.96428571 0.96428571 0.98214286 1. 0.96363636] mean value: 0.9747077922077922 key: train_recall value: [0.99599198 1. 1. 0.99799599 1. 0.99799197 0.99799197 1. 1. 1. ] mean value: 0.9989971911694876 key: test_roc_auc value: [0.92857143 0.92824675 0.9461039 0.91022727 0.94545455 0.90941558 0.91850649 0.90925325 0.96363636 0.93636364] mean value: 0.9295779220779221 key: train_roc_auc value: [0.99799599 0.99899598 0.99698795 0.99799398 0.99699399 0.99799398 0.99699198 0.98797595 0.998998 0.99699399] mean value: 0.9967921787349799 key: test_jcc value: [0.87301587 0.86885246 0.89830508 0.83870968 0.90322581 0.84375 0.85714286 0.84615385 0.93220339 0.88333333] mean value: 0.8744692327109542 key: train_jcc value: [0.99599198 0.998 0.9940239 0.996 0.99401198 0.99599198 0.994 0.97647059 0.998 0.9940239 ] mean value: 0.9936514340984011 MCC on Blind test: 0.58 Accuracy on Blind test: 0.83 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.06268549 0.04895139 0.04552436 0.04649806 0.05077744 0.04187751 0.04737687 0.04394555 0.04726696 0.0483892 ] mean value: 0.048329281806945804 key: score_time value: [0.0103085 0.01011729 0.00918198 0.01004171 0.00924993 0.00930953 0.00923204 0.00923681 0.00998807 0.00920033] mean value: 0.009586620330810546 key: test_mcc value: [0.96459895 0.80845318 0.91127765 0.93038564 0.81771432 0.89414155 0.856354 0.94608644 0.92973479 0.92973479] mean value: 0.8988481309656139 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98198198 0.9009009 0.95495495 0.96396396 0.9009009 0.94594595 0.92792793 0.97297297 0.96363636 0.96363636] mean value: 0.9476822276822277 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98214286 0.90598291 0.95575221 0.96491228 0.91056911 0.94827586 0.92982456 0.97345133 0.96491228 0.96491228] mean value: 0.9500735674217566 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96491228 0.85483871 0.93103448 0.93220339 0.8358209 0.91666667 0.9137931 0.96491228 0.93220339 0.93220339] mean value: 0.9178588588968405 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96363636 0.98181818 1. 1. 0.98214286 0.94642857 0.98214286 1. 1. ] mean value: 0.9856168831168831 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.90146104 0.95519481 0.96428571 0.9 0.94561688 0.92775974 0.97288961 0.96363636 0.96363636] mean value: 0.9476623376623377 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96491228 0.828125 0.91525424 0.93220339 0.8358209 0.90163934 0.86885246 0.94827586 0.93220339 0.93220339] mean value: 0.9059490248351457 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.66 Accuracy on Blind test: 0.87 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.17697835 0.17841768 0.18904471 0.1832931 0.18118811 0.18383741 0.18178773 0.18106866 0.18363118 0.18622208] mean value: 0.18254690170288085 key: score_time value: [0.01878119 0.01896453 0.01970887 0.0195868 0.01971936 0.02070141 0.02028441 0.02025509 0.01984024 0.02044296] mean value: 0.019828486442565917 key: test_mcc value: [0.94735177 0.85816689 0.91127765 0.83912942 0.89704631 0.94608644 0.83793444 0.91003577 0.94686415 0.89149871] mean value: 0.8985391553279924 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97297297 0.92792793 0.95495495 0.91891892 0.94594595 0.97297297 0.91891892 0.95495495 0.97272727 0.94545455] mean value: 0.9485749385749386 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97345133 0.92982456 0.95575221 0.92035398 0.94915254 0.97345133 0.92035398 0.95575221 0.97345133 0.94642857] mean value: 0.9497972046886377 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94827586 0.89830508 0.93103448 0.89655172 0.90322581 0.96491228 0.9122807 0.94736842 0.94827586 0.92982456] mean value: 0.9280054787144139 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96363636 0.98181818 0.94545455 1. 0.98214286 0.92857143 0.96428571 1. 0.96363636] mean value: 0.9729545454545454 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97321429 0.92824675 0.95519481 0.91915584 0.94545455 0.97288961 0.91883117 0.95487013 0.97272727 0.94545455] mean value: 0.9486038961038961 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94827586 0.86885246 0.91525424 0.85245902 0.90322581 0.94827586 0.85245902 0.91525424 0.94827586 0.89830508] mean value: 0.9050637443783822 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.5 Accuracy on Blind test: 0.81 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01411533 0.01404738 0.01408291 0.01340747 0.01328039 0.01412058 0.01400352 0.01396227 0.01360893 0.01377416] mean value: 0.013840293884277344 key: score_time value: [0.01007175 0.0093317 0.01000047 0.00991845 0.01003599 0.00996137 0.00995827 0.01003003 0.01001978 0.00936818] mean value: 0.009869599342346191 key: test_mcc value: [0.86504296 0.76054489 0.7763355 0.75592959 0.7964953 0.86075909 0.8049036 0.82447186 0.86373129 0.79507028] mean value: 0.810328436196157 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.92792793 0.87387387 0.88288288 0.87387387 0.89189189 0.92792793 0.9009009 0.90990991 0.92727273 0.89090909] mean value: 0.9007371007371007 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93220339 0.88333333 0.8907563 0.88135593 0.90163934 0.93220339 0.90598291 0.91525424 0.93220339 0.9 ] mean value: 0.9074932225082594 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.87301587 0.81538462 0.828125 0.82539683 0.83333333 0.88709677 0.86885246 0.87096774 0.87301587 0.83076923] mean value: 0.8505957726061176 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96363636 0.96363636 0.94545455 0.98214286 0.98214286 0.94642857 0.96428571 1. 0.98181818] mean value: 0.9729545454545454 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92857143 0.87467532 0.8836039 0.87451299 0.89107143 0.92743506 0.90048701 0.90941558 0.92727273 0.89090909] mean value: 0.9007954545454546 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.87301587 0.79104478 0.8030303 0.78787879 0.82089552 0.87301587 0.828125 0.84375 0.87301587 0.81818182] mean value: 0.831195382664599 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.90355897 2.85356164 2.86990976 2.85194898 2.86117744 2.82591081 2.8085959 2.80563903 2.87003446 2.81331229] mean value: 2.8463649272918703 key: score_time value: [0.10372114 0.09917402 0.10634851 0.10654712 0.10809731 0.10232544 0.10662699 0.10796857 0.09895921 0.15853858] mean value: 0.10983068943023681 key: test_mcc value: [0.94735177 0.89427626 0.9461039 0.94735177 0.88077101 0.88077101 0.87508299 0.92850223 0.96427411 0.94686415] mean value: 0.9211349204607814 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97297297 0.94594595 0.97297297 0.97297297 0.93693694 0.93693694 0.93693694 0.96396396 0.98181818 0.97272727] mean value: 0.9594185094185095 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97345133 0.94736842 0.97297297 0.97345133 0.94117647 0.94117647 0.93913043 0.96491228 0.98214286 0.97345133] mean value: 0.960923389013018 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94827586 0.91525424 0.96428571 0.94827586 0.88888889 0.88888889 0.91525424 0.94827586 0.96491228 0.94827586] mean value: 0.9330587695617379 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.98181818 0.98181818 1. 1. 1. 0.96428571 0.98214286 1. 1. ] mean value: 0.9910064935064935 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97321429 0.94626623 0.97305195 0.97321429 0.93636364 0.93636364 0.93668831 0.9637987 0.98181818 0.97272727] mean value: 0.9593506493506494 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94827586 0.9 0.94736842 0.94827586 0.88888889 0.88888889 0.8852459 0.93220339 0.96491228 0.94827586] mean value: 0.9252335357208913 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.72 Accuracy on Blind test: 0.89 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.18460464 1.13488936 1.1510613 1.13592005 1.1451478 1.14192295 1.12877488 1.1883049 1.133883 1.14420176] mean value: 1.1488710641860962 key: score_time value: [0.23078775 0.26031542 0.23725033 0.27050185 0.21965718 0.2571454 0.18469596 0.23342299 0.19362545 0.28665137] mean value: 0.2374053716659546 key: test_mcc value: [0.91127765 0.83912942 0.91003577 0.89188312 0.88077101 0.86075909 0.91355091 0.91003577 0.94561086 0.92973479] mean value: 0.899278839053012 key: train_mcc value: [0.96427099 0.95213028 0.95624352 0.95819837 0.95441822 0.95017394 0.95810779 0.95429502 0.95628827 0.95618835] mean value: 0.9560314751624464 key: test_accuracy value: [0.95495495 0.91891892 0.95495495 0.94594595 0.93693694 0.92792793 0.95495495 0.95495495 0.97272727 0.96363636] mean value: 0.9485913185913186 key: train_accuracy value: [0.98194584 0.97592778 0.9779338 0.97893681 0.97693079 0.97492477 0.97893681 0.97693079 0.97795591 0.97795591] mean value: 0.9778379225853915 key: test_fscore value: [0.95575221 0.92035398 0.95412844 0.94545455 0.94117647 0.93220339 0.95726496 0.95575221 0.97297297 0.96491228] mean value: 0.9499971464259592 key: train_fscore value: [0.98221344 0.97623762 0.97826087 0.97922849 0.97729516 0.97522299 0.97914598 0.97725025 0.97826087 0.97821782] mean value: 0.9781333491434867 key: test_precision value: [0.93103448 0.89655172 0.96296296 0.94545455 0.88888889 0.88709677 0.91803279 0.94736842 0.96428571 0.93220339] mean value: 0.9273879690450597 key: train_precision value: [0.96881092 0.96477495 0.96491228 0.96679688 0.96116505 0.962818 0.96856582 0.96296296 0.96491228 0.9667319 ] mean value: 0.9652451032642626 key: test_recall value: [0.98181818 0.94545455 0.94545455 0.94545455 1. 0.98214286 1. 0.96428571 0.98181818 1. ] mean value: 0.9746428571428571 key: train_recall value: [0.99599198 0.98797595 0.99198397 0.99198397 0.9939759 0.98795181 0.98995984 0.99196787 0.99198397 0.98997996] mean value: 0.9913755221285945 key: test_roc_auc value: [0.95519481 0.91915584 0.95487013 0.94594156 0.93636364 0.92743506 0.95454545 0.95487013 0.97272727 0.96363636] mean value: 0.948474025974026 key: train_roc_auc value: [0.98193173 0.97591569 0.97791969 0.97892371 0.97694787 0.97493783 0.97894786 0.97694586 0.97795591 0.97795591] mean value: 0.977838206533549 key: test_jcc value: [0.91525424 0.85245902 0.9122807 0.89655172 0.88888889 0.87301587 0.91803279 0.91525424 0.94736842 0.93220339] mean value: 0.9051309276535179 key: train_jcc value: [0.96504854 0.95357834 0.95744681 0.95930233 0.95559846 0.9516441 0.95914397 0.95551257 0.95744681 0.95736434] mean value: 0.9572086261518494 MCC on Blind test: 0.78 Accuracy on Blind test: 0.91 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02212071 0.01712632 0.017241 0.01725721 0.03649306 0.01726389 0.01711011 0.01731777 0.01719499 0.01736307] mean value: 0.019648814201354982 key: score_time value: [0.01253033 0.01254916 0.01253748 0.01241183 0.01262856 0.01252508 0.01236796 0.01235127 0.01247168 0.01250958] mean value: 0.012488293647766113 key: test_mcc value: [0.53168696 0.53168696 0.5365027 0.60383519 0.64303575 0.6048892 0.65842676 0.60383519 0.71097366 0.62075223] mean value: 0.6045624604926498 key: train_mcc value: [0.62690345 0.63122939 0.61741439 0.64894709 0.632912 0.6491123 0.62326104 0.64713651 0.64529188 0.60949387] mean value: 0.6331701922310734 key: test_accuracy value: [0.76576577 0.76576577 0.76576577 0.8018018 0.81981982 0.8018018 0.82882883 0.8018018 0.85454545 0.80909091] mean value: 0.8014987714987715 key: train_accuracy value: [0.81344032 0.81544634 0.80842528 0.82447342 0.81644935 0.82447342 0.8114343 0.82347041 0.82264529 0.80460922] mean value: 0.8164867347533582 key: test_fscore value: [0.75925926 0.75925926 0.74509804 0.7962963 0.83050847 0.81034483 0.83478261 0.80701754 0.85964912 0.8 ] mean value: 0.8002215431555298 key: train_fscore value: [0.81287726 0.81262729 0.80450358 0.82482482 0.81681682 0.82621648 0.80777096 0.82539683 0.8224674 0.80162767] mean value: 0.815512912261371 key: test_precision value: [0.77358491 0.77358491 0.80851064 0.81132075 0.79032258 0.78333333 0.81355932 0.79310345 0.83050847 0.84 ] mean value: 0.8017828363200135 key: train_precision value: [0.81616162 0.82608696 0.82217573 0.824 0.81437126 0.8172888 0.82291667 0.81568627 0.82329317 0.81404959] mean value: 0.819603006460176 key: test_recall value: [0.74545455 0.74545455 0.69090909 0.78181818 0.875 0.83928571 0.85714286 0.82142857 0.89090909 0.76363636] mean value: 0.8011038961038961 key: train_recall value: [0.80961924 0.7995992 0.78757515 0.8256513 0.81927711 0.83534137 0.79317269 0.83534137 0.82164329 0.78957916] mean value: 0.8116799864789821 key: test_roc_auc value: [0.76558442 0.76558442 0.7650974 0.80162338 0.81931818 0.80146104 0.82857143 0.80162338 0.85454545 0.80909091] mean value: 0.80125 key: train_roc_auc value: [0.81344416 0.81546225 0.80844621 0.82447224 0.81645218 0.82448431 0.811416 0.82348231 0.82264529 0.80460922] mean value: 0.8164914165680759 key: test_jcc value: [0.6119403 0.6119403 0.59375 0.66153846 0.71014493 0.68115942 0.71641791 0.67647059 0.75384615 0.66666667] mean value: 0.668387472557535 key: train_jcc value: [0.68474576 0.68439108 0.67294521 0.70187394 0.69035533 0.70389171 0.67753002 0.7027027 0.69846678 0.66893039] mean value: 0.6885832913576179 MCC on Blind test: 0.5 Accuracy on Blind test: 0.79 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.15152597 0.13697934 0.13371658 0.28721023 0.12940335 0.16948915 0.12985754 0.13179946 0.12889552 0.12940407] mean value: 0.15282812118530273 key: score_time value: [0.01140881 0.01162529 0.01171136 0.01240182 0.01138711 0.01143169 0.01129651 0.01137733 0.01139212 0.01139045] mean value: 0.011542248725891113 key: test_mcc value: [0.93038564 0.91368563 0.9461039 0.96459895 0.86471225 0.88077101 0.89414155 0.93029809 0.92973479 0.92973479] mean value: 0.9184166595432225 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96396396 0.95495495 0.97297297 0.98198198 0.92792793 0.93693694 0.94594595 0.96396396 0.96363636 0.96363636] mean value: 0.9575921375921376 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96491228 0.95652174 0.97297297 0.98214286 0.93333333 0.94117647 0.94827586 0.96551724 0.96491228 0.96491228] mean value: 0.9594677318721373 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93220339 0.91666667 0.96428571 0.96491228 0.875 0.88888889 0.91666667 0.93333333 0.93220339 0.93220339] mean value: 0.9256363720034549 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.98181818 1. 1. 1. 0.98214286 1. 1. 1. ] mean value: 0.9963961038961039 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96428571 0.95535714 0.97305195 0.98214286 0.92727273 0.93636364 0.94561688 0.96363636 0.96363636 0.96363636] mean value: 0.9575 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93220339 0.91666667 0.94736842 0.96491228 0.875 0.88888889 0.90163934 0.93333333 0.93220339 0.93220339] mean value: 0.9224419104397095 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.73 Accuracy on Blind test: 0.89 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.06950426 0.11023021 0.08584619 0.06061864 0.09362602 0.07184792 0.06709003 0.10699701 0.08673477 0.05997014] mean value: 0.08124651908874511 key: score_time value: [0.01922679 0.02629018 0.01244116 0.01238298 0.01918435 0.0123589 0.01421976 0.0239265 0.01261425 0.01244736] mean value: 0.016509222984313964 key: test_mcc value: [0.58827674 0.78434561 0.80286425 0.66058982 0.73247207 0.66597107 0.80802876 0.67720229 0.65465367 0.74743385] mean value: 0.7121838124804615 key: train_mcc value: [0.82706632 0.81733708 0.82049509 0.82325845 0.81464593 0.83301414 0.80679361 0.80033567 0.82344592 0.8154123 ] mean value: 0.818180451473204 key: test_accuracy value: [0.79279279 0.89189189 0.9009009 0.82882883 0.86486486 0.82882883 0.9009009 0.83783784 0.82727273 0.87272727] mean value: 0.8546846846846847 key: train_accuracy value: [0.91273821 0.90772317 0.90872618 0.9107322 0.90672016 0.91574724 0.90270812 0.8996991 0.91082164 0.90681363] mean value: 0.908242965369053 key: test_fscore value: [0.8 0.89285714 0.89719626 0.83478261 0.87179487 0.84297521 0.90756303 0.84482759 0.82882883 0.87719298] mean value: 0.859801851434343 key: train_fscore value: [0.9154519 0.91085271 0.91258405 0.91367604 0.90909091 0.91812865 0.90536585 0.90196078 0.91367604 0.90979631] mean value: 0.9110583263662413 key: test_precision value: [0.76666667 0.87719298 0.92307692 0.8 0.83606557 0.78461538 0.85714286 0.81666667 0.82142857 0.84745763] mean value: 0.8330313252942346 key: train_precision value: [0.88867925 0.88180113 0.87638376 0.88533835 0.88571429 0.89204545 0.88045541 0.88122605 0.88533835 0.88157895] mean value: 0.8838560975791193 key: test_recall value: [0.83636364 0.90909091 0.87272727 0.87272727 0.91071429 0.91071429 0.96428571 0.875 0.83636364 0.90909091] mean value: 0.8897077922077922 key: train_recall value: [0.94388778 0.94188377 0.95190381 0.94388778 0.93373494 0.94578313 0.93172691 0.92369478 0.94388778 0.93987976] mean value: 0.9400270420358791 key: test_roc_auc value: [0.79318182 0.89204545 0.90064935 0.82922078 0.86444805 0.82808442 0.90032468 0.8375 0.82727273 0.87272727] mean value: 0.8545454545454545 key: train_roc_auc value: [0.91270694 0.90768887 0.90868283 0.91069891 0.90674723 0.91577734 0.9027372 0.89972314 0.91082164 0.90681363] mean value: 0.908239772718127 key: test_jcc value: [0.66666667 0.80645161 0.81355932 0.71641791 0.77272727 0.72857143 0.83076923 0.73134328 0.70769231 0.78125 ] mean value: 0.7555449035393881 key: train_jcc value: [0.84408602 0.83629893 0.83922261 0.84107143 0.83333333 0.84864865 0.82709447 0.82142857 0.84107143 0.83451957] mean value: 0.8366775026391152 MCC on Blind test: 0.59 Accuracy on Blind test: 0.83 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01626134 0.01653409 0.01663375 0.01649666 0.01635027 0.01643872 0.01659584 0.01632452 0.01655459 0.01650786] mean value: 0.016469764709472656 key: score_time value: [0.01276565 0.01302242 0.01256537 0.01214433 0.01233745 0.01241207 0.0123868 0.01220846 0.01229262 0.012537 ] mean value: 0.012467217445373536 key: test_mcc value: [0.60383519 0.53417408 0.49641957 0.49561285 0.69373177 0.53199093 0.6576811 0.60409227 0.54626778 0.69378191] mean value: 0.5857587444320642 key: train_mcc value: [0.59486707 0.59700354 0.60695011 0.60321575 0.60327781 0.593109 0.59093327 0.60693429 0.60556194 0.57923276] mean value: 0.5981085545854256 key: test_accuracy value: [0.8018018 0.76576577 0.74774775 0.74774775 0.84684685 0.76576577 0.82882883 0.8018018 0.77272727 0.84545455] mean value: 0.7924488124488125 key: train_accuracy value: [0.79739218 0.79839519 0.80341023 0.80140421 0.80140421 0.79638917 0.79538616 0.80341023 0.80260521 0.78957916] mean value: 0.7989375943461647 key: test_fscore value: [0.7962963 0.75 0.73584906 0.74074074 0.84955752 0.76363636 0.83185841 0.8 0.77876106 0.83809524] mean value: 0.7884794686522855 key: train_fscore value: [0.7959596 0.79593909 0.80161943 0.79795918 0.79713115 0.79264556 0.79268293 0.80121704 0.79918451 0.78787879] mean value: 0.796221726221148 key: test_precision value: [0.81132075 0.79591837 0.76470588 0.75471698 0.84210526 0.77777778 0.8245614 0.81481481 0.75862069 0.88 ] mean value: 0.8024541934463368 key: train_precision value: [0.80244399 0.80658436 0.80981595 0.81288981 0.81380753 0.80665281 0.80246914 0.80942623 0.81327801 0.79429735] mean value: 0.8071665181788477 key: test_recall value: [0.78181818 0.70909091 0.70909091 0.72727273 0.85714286 0.75 0.83928571 0.78571429 0.8 0.8 ] mean value: 0.7759415584415584 key: train_recall value: [0.78957916 0.78557114 0.79358717 0.78356713 0.7811245 0.77911647 0.78313253 0.79317269 0.78557114 0.78156313] mean value: 0.7855985062494467 key: test_roc_auc value: [0.80162338 0.76525974 0.7474026 0.74756494 0.84675325 0.76590909 0.82873377 0.80194805 0.77272727 0.84545455] mean value: 0.7923376623376623 key: train_roc_auc value: [0.79740002 0.79840806 0.80342009 0.80142212 0.80138389 0.79637186 0.79537388 0.80339997 0.80260521 0.78957916] mean value: 0.7989364270710095 key: test_jcc value: [0.66153846 0.6 0.58208955 0.58823529 0.73846154 0.61764706 0.71212121 0.66666667 0.63768116 0.72131148] mean value: 0.6525752418797988 key: train_jcc value: [0.66107383 0.66104553 0.66891892 0.66383701 0.66269165 0.65651438 0.65656566 0.66835871 0.6655348 0.65 ] mean value: 0.6614540497740491 MCC on Blind test: 0.48 Accuracy on Blind test: 0.78 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0310688 0.02964067 0.03017783 0.03573632 0.03960013 0.02891755 0.02455711 0.03152013 0.0423491 0.03097415] mean value: 0.032454180717468264 key: score_time value: [0.0123229 0.01248646 0.01263022 0.01230764 0.01214814 0.01245093 0.01247001 0.01256704 0.01229119 0.01259637] mean value: 0.012427091598510742 key: test_mcc value: [0.58827674 0.79230071 0.75530907 0.50764074 0.80188377 0.67564935 0.65445146 0.77224584 0.64051262 0.71334833] mean value: 0.6901618622911714 key: train_mcc value: [0.78427841 0.76899725 0.73866251 0.57031401 0.80658958 0.73431765 0.69871873 0.76303423 0.61951256 0.72764254] mean value: 0.721206747763299 key: test_accuracy value: [0.79279279 0.89189189 0.87387387 0.72972973 0.9009009 0.83783784 0.81981982 0.87387387 0.79090909 0.85454545] mean value: 0.8366175266175266 key: train_accuracy value: [0.88966901 0.88264794 0.86760281 0.75827482 0.90270812 0.86459378 0.84653962 0.87863591 0.78156313 0.86072144] mean value: 0.8532956585186421 key: test_fscore value: [0.8 0.89830508 0.8627451 0.65116279 0.90265487 0.83928571 0.83870968 0.88888889 0.82706767 0.84615385] mean value: 0.8354973636660027 key: train_fscore value: [0.89563567 0.88825215 0.86105263 0.69377382 0.8998968 0.85592316 0.85552408 0.88552507 0.81923715 0.85101822] mean value: 0.8505838757358823 key: test_precision value: [0.76666667 0.84126984 0.93617021 0.90322581 0.89473684 0.83928571 0.76470588 0.8 0.70512821 0.89795918] mean value: 0.8349148354699671 key: train_precision value: [0.85045045 0.84854015 0.90687361 0.94791667 0.92569002 0.91343964 0.80748663 0.8372093 0.69872702 0.91474654] mean value: 0.8651080026739061 key: test_recall value: [0.83636364 0.96363636 0.8 0.50909091 0.91071429 0.83928571 0.92857143 1. 1. 0.8 ] mean value: 0.8587662337662337 key: train_recall value: [0.94589178 0.93186373 0.81963928 0.54709419 0.87550201 0.80522088 0.90963855 0.93975904 0.98997996 0.79559118] mean value: 0.8560180602168191 key: test_roc_auc value: [0.79318182 0.89253247 0.87321429 0.72775974 0.90081169 0.83782468 0.81883117 0.87272727 0.79090909 0.85454545] mean value: 0.8362337662337662 key: train_roc_auc value: [0.88961256 0.88259853 0.86765096 0.75848685 0.90268086 0.86453429 0.84660284 0.87869715 0.78156313 0.86072144] mean value: 0.853314862657041 key: test_jcc value: [0.66666667 0.81538462 0.75862069 0.48275862 0.82258065 0.72307692 0.72222222 0.8 0.70512821 0.73333333] mean value: 0.7229771921318083 key: train_jcc value: [0.81099656 0.79896907 0.75600739 0.5311284 0.81801126 0.74813433 0.74752475 0.79456706 0.69382022 0.74067164] mean value: 0.743983070132102 MCC on Blind test: 0.61 Accuracy on Blind test: 0.85 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03678513 0.04669952 0.03958678 0.0368154 0.04081035 0.04410028 0.03240681 0.03612375 0.04343247 0.0369277 ] mean value: 0.039368820190429685 key: score_time value: [0.0123539 0.01226425 0.01249218 0.01270509 0.01231241 0.01213574 0.0124712 0.01646805 0.01875472 0.01255918] mean value: 0.013451671600341797 key: test_mcc value: [0.57674936 0.27435929 0.4414112 0.61409543 0.84439989 0.71168831 0.78567192 0.44752365 0.7823356 0.73029674] mean value: 0.6208531392970608 key: train_mcc value: [0.61504983 0.35761898 0.45919707 0.70513165 0.81895676 0.80887671 0.8079072 0.49556275 0.81080701 0.82264753] mean value: 0.6701755481213304 key: test_accuracy value: [0.75675676 0.58558559 0.67567568 0.79279279 0.91891892 0.85585586 0.89189189 0.67567568 0.89090909 0.86363636] mean value: 0.7907698607698608 key: train_accuracy value: [0.77632899 0.61885657 0.68004012 0.84052156 0.90772317 0.90371113 0.90270812 0.70511535 0.90480962 0.90881764] mean value: 0.8148632269554154 key: test_fscore value: [0.8 0.3030303 0.52631579 0.75268817 0.92436975 0.85714286 0.89655172 0.53846154 0.88888889 0.86956522] mean value: 0.7357014238468678 key: train_fscore value: [0.81676253 0.39297125 0.53701016 0.81703107 0.91170825 0.90062112 0.90628019 0.58938547 0.90216272 0.91358025] mean value: 0.768751301189569 key: test_precision value: [0.675 0.90909091 0.95238095 0.92105263 0.87301587 0.85714286 0.86666667 0.95454545 0.90566038 0.83333333] mean value: 0.8747889055113485 key: train_precision value: [0.69220056 0.96850394 0.97368421 0.95945946 0.87316176 0.92948718 0.87337058 0.96788991 0.9279661 0.86823105] mean value: 0.9033954742454171 key: test_recall value: [0.98181818 0.18181818 0.36363636 0.63636364 0.98214286 0.85714286 0.92857143 0.375 0.87272727 0.90909091] mean value: 0.7088311688311688 key: train_recall value: [0.99599198 0.24649299 0.37074148 0.71142285 0.95381526 0.87349398 0.94176707 0.42369478 0.87775551 0.96392786] mean value: 0.7359103749668011 key: test_roc_auc value: [0.75876623 0.58198052 0.67288961 0.7913961 0.91834416 0.85584416 0.89155844 0.67840909 0.89090909 0.86363636] mean value: 0.7903733766233766 key: train_roc_auc value: [0.77610844 0.61923043 0.68035066 0.84065118 0.90776935 0.90368086 0.90274726 0.70483336 0.90480962 0.90881764] mean value: 0.814899880081448 key: test_jcc value: [0.66666667 0.17857143 0.35714286 0.60344828 0.859375 0.75 0.8125 0.36842105 0.8 0.76923077] mean value: 0.616535605010537 key: train_jcc value: [0.69027778 0.2445328 0.36706349 0.69066148 0.8377425 0.81920904 0.82862191 0.41782178 0.8217636 0.84090909] mean value: 0.6558603479044525 MCC on Blind test: 0.63 Accuracy on Blind test: 0.84 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.30124474 0.28551722 0.28614354 0.28642416 0.2862277 0.28524876 0.28583145 0.28529334 0.28703308 0.28703952] mean value: 0.28760035037994386 key: score_time value: [0.01587796 0.01592803 0.01595163 0.01585817 0.01593447 0.01601052 0.01596475 0.01596737 0.01586986 0.0159719 ] mean value: 0.015933465957641602 key: test_mcc value: [0.82480596 0.85644694 0.856354 0.83798701 0.86471225 0.84111937 0.84111937 0.89704631 0.87402845 0.9104463 ] mean value: 0.8604065967920353 key: train_mcc value: [0.94009038 0.93823637 0.92852803 0.93240093 0.94057906 0.92815139 0.95000524 0.92424985 0.94621294 0.93734857] mean value: 0.9365802768993494 key: test_accuracy value: [0.90990991 0.92792793 0.92792793 0.91891892 0.92792793 0.91891892 0.91891892 0.94594595 0.93636364 0.95454545] mean value: 0.9287305487305487 key: train_accuracy value: [0.96990973 0.96890672 0.96389168 0.96589769 0.96990973 0.96389168 0.97492477 0.96188566 0.97294589 0.96793587] mean value: 0.9680099416485931 key: test_fscore value: [0.9137931 0.92857143 0.92592593 0.91891892 0.93333333 0.92307692 0.92307692 0.94915254 0.9380531 0.95575221] mean value: 0.9309654408459123 key: train_fscore value: [0.97029703 0.96939783 0.96463654 0.96653543 0.97047244 0.96435644 0.97512438 0.96245059 0.97329377 0.96881092] mean value: 0.9685375365555099 key: test_precision value: [0.86885246 0.9122807 0.94339623 0.91071429 0.875 0.8852459 0.8852459 0.90322581 0.9137931 0.93103448] mean value: 0.9028788868837357 key: train_precision value: [0.95890411 0.95525292 0.9460501 0.94970986 0.95173745 0.95117188 0.96646943 0.94747082 0.9609375 0.943074 ] mean value: 0.9530778064480604 key: test_recall value: [0.96363636 0.94545455 0.90909091 0.92727273 1. 0.96428571 0.96428571 1. 0.96363636 0.98181818] mean value: 0.9619480519480519 key: train_recall value: [0.98196393 0.98396794 0.98396794 0.98396794 0.98995984 0.97791165 0.98393574 0.97791165 0.98597194 0.99599198] mean value: 0.9845550538828661 key: test_roc_auc value: [0.91038961 0.92808442 0.92775974 0.91899351 0.92727273 0.91850649 0.91850649 0.94545455 0.93636364 0.95454545] mean value: 0.9285876623376623 key: train_roc_auc value: [0.96989763 0.9688916 0.96387152 0.96587955 0.96992982 0.96390572 0.9749338 0.96190172 0.97294589 0.96793587] mean value: 0.9680093117962834 key: test_jcc value: [0.84126984 0.86666667 0.86206897 0.85 0.875 0.85714286 0.85714286 0.90322581 0.88333333 0.91525424] mean value: 0.8711104564812545 key: train_jcc value: [0.94230769 0.94061303 0.9316888 0.9352381 0.94263862 0.93116635 0.95145631 0.92761905 0.94797688 0.93950851] mean value: 0.9390213333766735 MCC on Blind test: 0.8 Accuracy on Blind test: 0.92 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.21933794 0.23009133 0.23146009 0.22986817 0.23455238 0.23486233 0.23559117 0.22639084 0.224648 0.22071886] mean value: 0.22875211238861085 key: score_time value: [0.03442264 0.04063106 0.02764821 0.03920436 0.03479433 0.04123449 0.03968191 0.03944087 0.02062654 0.03691602] mean value: 0.035460042953491214 key: test_mcc value: [0.86102173 0.86102173 0.94608644 0.91006494 0.83319558 0.89704631 0.89414155 0.89414155 0.92973479 0.91287093] mean value: 0.893932555011654 key: train_mcc value: [0.99198387 0.9939999 0.99197592 0.9900196 1. 0.99599599 0.99200792 0.99599599 0.99201584 1. ] mean value: 0.9943995039269277 key: test_accuracy value: [0.92792793 0.92792793 0.97297297 0.95495495 0.90990991 0.94594595 0.94594595 0.94594595 0.96363636 0.95454545] mean value: 0.944971334971335 key: train_accuracy value: [0.99598796 0.99699097 0.99598796 0.99498495 1. 0.99799398 0.99598796 0.99799398 0.99599198 1. ] mean value: 0.9971919767317986 key: test_fscore value: [0.93103448 0.93103448 0.97247706 0.95495495 0.91803279 0.94915254 0.94827586 0.94827586 0.96491228 0.95652174] mean value: 0.9474672057920627 key: train_fscore value: [0.996 0.997003 0.99599198 0.99501496 1. 0.99799599 0.996 0.99799599 0.99600798 1. ] mean value: 0.9972009904105401 key: test_precision value: [0.8852459 0.8852459 0.98148148 0.94642857 0.84848485 0.90322581 0.91666667 0.91666667 0.93220339 0.91666667] mean value: 0.9132315900955711 key: train_precision value: [0.99401198 0.9940239 0.99599198 0.99007937 1. 0.996 0.99203187 0.996 0.99204771 1. ] mean value: 0.995018681570533 key: test_recall value: [0.98181818 0.98181818 0.96363636 0.96363636 1. 1. 0.98214286 0.98214286 1. 1. ] mean value: 0.9855194805194805 key: train_recall value: [0.99799599 1. 0.99599198 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9993987975951903 key: test_roc_auc value: [0.92840909 0.92840909 0.97288961 0.95503247 0.90909091 0.94545455 0.94561688 0.94561688 0.96363636 0.95454545] mean value: 0.9448701298701299 key: train_roc_auc value: [0.99598595 0.99698795 0.99598796 0.99497992 1. 0.99799599 0.99599198 0.99799599 0.99599198 1. ] mean value: 0.9971917731044418 key: test_jcc value: [0.87096774 0.87096774 0.94642857 0.9137931 0.84848485 0.90322581 0.90163934 0.90163934 0.93220339 0.91666667] mean value: 0.9006016558706041 key: train_jcc value: [0.99203187 0.9940239 0.99201597 0.99007937 1. 0.996 0.99203187 0.996 0.99204771 1. ] mean value: 0.9944230696263322 MCC on Blind test: 0.68 Accuracy on Blind test: 0.87 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.45978689 0.58875775 0.49073029 0.46216607 0.47802901 0.42841339 0.42438769 0.4908936 0.44094849 0.39796805] mean value: 0.4662081241607666 key: score_time value: [0.02401781 0.04390836 0.04260111 0.04216743 0.02424955 0.04410362 0.04739928 0.04214454 0.02356482 0.0384655 ] mean value: 0.0372622013092041 key: test_mcc value: [0.83369955 0.76054489 0.77216596 0.73587873 0.75979502 0.81228039 0.75530907 0.76868784 0.86373129 0.82035423] mean value: 0.788244696340273 key: train_mcc value: [0.97408004 0.97211078 0.97014526 0.97408004 0.96613365 0.97204135 0.97219811 0.97006835 0.96821588 0.97009768] mean value: 0.9709171117615186 key: test_accuracy value: [0.90990991 0.87387387 0.88288288 0.86486486 0.87387387 0.9009009 0.87387387 0.88288288 0.92727273 0.90909091] mean value: 0.8899426699426699 key: train_accuracy value: [0.98696088 0.98595787 0.98495486 0.98696088 0.98294885 0.98595787 0.98595787 0.98495486 0.98396794 0.98496994] mean value: 0.985359183763716 key: test_fscore value: [0.91666667 0.88333333 0.88888889 0.87179487 0.8852459 0.90909091 0.88333333 0.88888889 0.93220339 0.9122807 ] mean value: 0.8971726885221131 key: train_fscore value: [0.98709037 0.98611111 0.9851338 0.98709037 0.98311817 0.98605578 0.98611111 0.98507463 0.98415842 0.98510427] mean value: 0.9855048015415081 key: test_precision value: [0.84615385 0.81538462 0.83870968 0.82258065 0.81818182 0.84615385 0.828125 0.85245902 0.87301587 0.88135593] mean value: 0.8422120270067477 key: train_precision value: [0.97834646 0.97642436 0.9745098 0.97834646 0.97249509 0.97826087 0.9745098 0.97633136 0.97260274 0.97637795] mean value: 0.9758204894124628 key: test_recall value: [1. 0.96363636 0.94545455 0.92727273 0.96428571 0.98214286 0.94642857 0.92857143 1. 0.94545455] mean value: 0.9603246753246754 key: train_recall value: [0.99599198 0.99599198 0.99599198 0.99599198 0.9939759 0.9939759 0.99799197 0.9939759 0.99599198 0.99398798] mean value: 0.9953867574506443 key: test_roc_auc value: [0.91071429 0.87467532 0.88344156 0.86542208 0.87305195 0.90016234 0.87321429 0.88246753 0.92727273 0.90909091] mean value: 0.8899512987012987 key: train_roc_auc value: [0.98695182 0.9859478 0.98494378 0.98695182 0.9829599 0.98596591 0.98596993 0.9849639 0.98396794 0.98496994] mean value: 0.9853592727623922 key: test_jcc value: [0.84615385 0.79104478 0.8 0.77272727 0.79411765 0.83333333 0.79104478 0.8 0.87301587 0.83870968] mean value: 0.814014720194731 key: train_jcc value: [0.9745098 0.97260274 0.97070312 0.9745098 0.96679688 0.97249509 0.97260274 0.97058824 0.96881092 0.97064579] mean value: 0.9714265119740892 MCC on Blind test: 0.49 Accuracy on Blind test: 0.8 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [1.3042357 1.29457378 1.29446912 1.29047823 1.29637837 1.31723022 1.28884625 1.29490328 1.29829788 1.29580331] mean value: 1.2975216150283813 key: score_time value: [0.01003814 0.00967193 0.00972557 0.00951791 0.00977921 0.00974941 0.00971889 0.00971889 0.01020885 0.00968409] mean value: 0.009781289100646972 key: test_mcc value: [0.8972375 0.88102763 0.9461039 0.93038564 0.86471225 0.87733514 0.89414155 0.93029809 0.9104463 0.92973479] mean value: 0.9061422782542579 key: train_mcc value: [0.99200779 0.98605489 0.99200779 0.98407831 0.98803559 0.9900198 0.9900198 0.98605528 0.98206056 0.99002966] mean value: 0.9880369474583409 key: test_accuracy value: [0.94594595 0.93693694 0.97297297 0.96396396 0.92792793 0.93693694 0.94594595 0.96396396 0.95454545 0.96363636] mean value: 0.9512776412776413 key: train_accuracy value: [0.99598796 0.99297894 0.99598796 0.99197593 0.99398195 0.99498495 0.99498495 0.99297894 0.99098196 0.99498998] mean value: 0.9939833528642038 key: test_fscore value: [0.94827586 0.94017094 0.97297297 0.96491228 0.93333333 0.94017094 0.94827586 0.96551724 0.95575221 0.96491228] mean value: 0.9534293925958317 key: train_fscore value: [0.99600798 0.99303483 0.99600798 0.99204771 0.99401198 0.995005 0.995005 0.99302094 0.99104478 0.99501496] mean value: 0.9940201142152542 key: test_precision value: [0.90163934 0.88709677 0.96428571 0.93220339 0.875 0.90163934 0.91666667 0.93333333 0.93103448 0.93220339] mean value: 0.917510243942349 key: train_precision value: [0.99204771 0.98616601 0.99204771 0.98422091 0.98809524 0.99005964 0.99005964 0.98613861 0.98418972 0.99007937] mean value: 0.9883104567288739 key: test_recall value: [1. 1. 0.98181818 1. 1. 0.98214286 0.98214286 1. 0.98181818 1. ] mean value: 0.9927922077922078 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99799599 1. ] mean value: 0.9997995991983968 key: test_roc_auc value: [0.94642857 0.9375 0.97305195 0.96428571 0.92727273 0.93652597 0.94561688 0.96363636 0.95454545 0.96363636] mean value: 0.95125 key: train_roc_auc value: [0.99598394 0.99297189 0.99598394 0.99196787 0.99398798 0.99498998 0.99498998 0.99298597 0.99098196 0.99498998] mean value: 0.9939833482225495 key: test_jcc value: [0.90163934 0.88709677 0.94736842 0.93220339 0.875 0.88709677 0.90163934 0.93333333 0.91525424 0.93220339] mean value: 0.9112835008246805 key: train_jcc value: [0.99204771 0.98616601 0.99204771 0.98422091 0.98809524 0.99005964 0.99005964 0.98613861 0.98224852 0.99007937] mean value: 0.988116336467864 MCC on Blind test: 0.77 Accuracy on Blind test: 0.91 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.04222107 0.04232812 0.04297519 0.04269028 0.04910517 0.05046797 0.05291557 0.04299235 0.04354048 0.0433166 ] mean value: 0.045255279541015624 key: score_time value: [0.01371765 0.01335597 0.01350236 0.0135746 0.01405025 0.01356721 0.03361487 0.01649952 0.0135181 0.01359224] mean value: 0.015899276733398436 key: test_mcc value: [0.36094911 0.32868787 0.33903271 0.34503278 0.35489665 0.3790812 0.39886202 0.42911451 0.33333333 0.42754614] mean value: 0.36965363239191945 key: train_mcc value: [0.38761246 0.38589628 0.38417641 0.37725875 0.49294912 0.37476691 0.39703819 0.38340652 0.37318816 0.38357064] mean value: 0.3939863448644553 key: test_accuracy value: [0.61261261 0.59459459 0.61261261 0.6036036 0.64864865 0.63963964 0.63963964 0.65765766 0.6 0.65454545] mean value: 0.6263554463554464 key: train_accuracy value: [0.63089268 0.62988967 0.62888666 0.62487462 0.70210632 0.62286861 0.63590772 0.62788365 0.62224449 0.62825651] mean value: 0.6353810931793377 key: test_fscore value: [0.71895425 0.70967742 0.71523179 0.71428571 0.72727273 0.73333333 0.73684211 0.74666667 0.71428571 0.74324324] mean value: 0.7259792960150879 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") train_fscore value: [0.73060029 0.73006584 0.72953216 0.72740525 0.76814988 0.72594752 0.73289183 0.72860278 0.72581818 0.72899927] mean value: 0.7328013010149701 key: test_precision value: [0.56122449 0.55 0.5625 0.55555556 0.59770115 0.58510638 0.58333333 0.59574468 0.55555556 0.59139785] mean value: 0.5738118996957803 key: train_precision value: [0.57554787 0.57488479 0.57422325 0.57159221 0.62835249 0.56979405 0.57839721 0.5730725 0.5696347 0.57356322] mean value: 0.5789062286727364 key: test_recall value: [1. 1. 0.98181818 1. 0.92857143 0.98214286 1. 1. 1. 1. ] mean value: 0.9892532467532468 key: train_recall value: [1. 1. 1. 1. 0.98795181 1. 1. 1. 1. 1. ] mean value: 0.9987951807228915 key: test_roc_auc value: [0.61607143 0.59821429 0.61590909 0.60714286 0.6461039 0.63652597 0.63636364 0.65454545 0.6 0.65454545] mean value: 0.6265422077922078 key: train_roc_auc value: [0.63052209 0.62951807 0.62851406 0.62449799 0.70239274 0.62324649 0.63627255 0.62825651 0.62224449 0.62825651] mean value: 0.6353721499223346 key: test_jcc value: [0.56122449 0.55 0.55670103 0.55555556 0.57142857 0.57894737 0.58333333 0.59574468 0.55555556 0.59139785] mean value: 0.5699888435331252 key: train_jcc value: [0.57554787 0.57488479 0.57422325 0.57159221 0.62357414 0.56979405 0.57839721 0.5730725 0.5696347 0.57356322] mean value: 0.57842839407926 MCC on Blind test: 0.21 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02591252 0.02014327 0.01982069 0.04004431 0.04341722 0.04567528 0.02161407 0.01908636 0.019238 0.03228545] mean value: 0.028723716735839844 key: score_time value: [0.01445341 0.01340151 0.01611209 0.01954746 0.01933146 0.01936555 0.0124433 0.01254368 0.01254106 0.01970911] mean value: 0.015944862365722658 key: test_mcc value: [0.67762003 0.805216 0.7658331 0.69959151 0.8049036 0.68237361 0.76868784 0.71884134 0.74743385 0.78181818] mean value: 0.7452319057416088 key: train_mcc value: [0.79490575 0.80171572 0.81303246 0.79701682 0.81334074 0.80086148 0.79895075 0.79369889 0.79388846 0.7851833 ] mean value: 0.7992594371416237 key: test_accuracy value: [0.83783784 0.9009009 0.88288288 0.84684685 0.9009009 0.83783784 0.88288288 0.85585586 0.87272727 0.89090909] mean value: 0.870958230958231 key: train_accuracy value: [0.89669007 0.8996991 0.90471414 0.89769308 0.90571715 0.8996991 0.89869609 0.89568706 0.89579158 0.89178357] mean value: 0.8986170937662687 key: test_fscore value: [0.84210526 0.90434783 0.88073394 0.85470085 0.90598291 0.85 0.88888889 0.86666667 0.87719298 0.89090909] mean value: 0.8761528423803527 key: train_fscore value: [0.89990282 0.9034749 0.90909091 0.90097087 0.90873786 0.90253411 0.90165531 0.89941973 0.8996139 0.89514563] mean value: 0.9020546048367907 key: test_precision value: [0.81355932 0.86666667 0.88888889 0.80645161 0.86885246 0.796875 0.85245902 0.8125 0.84745763 0.89090909] mean value: 0.844461968393025 key: train_precision value: [0.87358491 0.87150838 0.86996337 0.87382298 0.87969925 0.87689394 0.87523629 0.86753731 0.86778399 0.86817326] mean value: 0.87242036699792 key: test_recall value: [0.87272727 0.94545455 0.87272727 0.90909091 0.94642857 0.91071429 0.92857143 0.92857143 0.90909091 0.89090909] mean value: 0.9114285714285714 key: train_recall value: [0.92785571 0.93787575 0.95190381 0.92985972 0.93975904 0.92971888 0.92971888 0.93373494 0.93386774 0.9238477 ] mean value: 0.9338142147749314 key: test_roc_auc value: [0.83814935 0.9012987 0.88279221 0.8474026 0.90048701 0.83717532 0.88246753 0.85519481 0.87272727 0.89090909] mean value: 0.8708603896103897 key: train_roc_auc value: [0.89665878 0.89966077 0.90466676 0.89766078 0.90575126 0.89972918 0.89872717 0.89572519 0.89579158 0.89178357] mean value: 0.8986155041005707 key: test_jcc value: [0.72727273 0.82539683 0.78688525 0.74626866 0.828125 0.73913043 0.8 0.76470588 0.78125 0.80327869] mean value: 0.780231346094775 key: train_jcc value: [0.8180212 0.82394366 0.83333333 0.81978799 0.83274021 0.82238011 0.82092199 0.8172232 0.81754386 0.81019332] mean value: 0.8216088868355006 MCC on Blind test: 0.59 Accuracy on Blind test: 0.82 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:156: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:159: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.24418879 0.39535999 0.3969667 0.36512113 0.47421598 0.27459502 0.34953403 0.35271382 0.39017797 0.22438264] mean value: 0.346725606918335 key: score_time value: [0.01262951 0.02248549 0.01937032 0.01953912 0.01949883 0.01244497 0.01936412 0.01947474 0.02145219 0.02333951] mean value: 0.018959879875183105 key: test_mcc value: [0.67762003 0.76590909 0.7658331 0.69959151 0.8049036 0.68237361 0.76868784 0.71884134 0.74743385 0.78181818] mean value: 0.7413012150181251 key: train_mcc value: [0.79490575 0.80677119 0.81303246 0.79701682 0.81334074 0.80086148 0.79895075 0.79369889 0.79388846 0.7851833 ] mean value: 0.7997649837559726 key: test_accuracy value: [0.83783784 0.88288288 0.88288288 0.84684685 0.9009009 0.83783784 0.88288288 0.85585586 0.87272727 0.89090909] mean value: 0.8691564291564291 key: train_accuracy value: [0.89669007 0.90270812 0.90471414 0.89769308 0.90571715 0.8996991 0.89869609 0.89568706 0.89579158 0.89178357] mean value: 0.8989179964743931 key: test_fscore value: [0.84210526 0.88288288 0.88073394 0.85470085 0.90598291 0.85 0.88888889 0.86666667 0.87719298 0.89090909] mean value: 0.8740063480599454 key: train_fscore value: [0.89990282 0.90555015 0.90909091 0.90097087 0.90873786 0.90253411 0.90165531 0.89941973 0.8996139 0.89514563] mean value: 0.9022621290949477 key: test_precision value: [0.81355932 0.875 0.88888889 0.80645161 0.86885246 0.796875 0.85245902 0.8125 0.84745763 0.89090909] mean value: 0.8452953017263584 key: train_precision value: [0.87358491 0.88068182 0.86996337 0.87382298 0.87969925 0.87689394 0.87523629 0.86753731 0.86778399 0.86817326] mean value: 0.873337710827275 key: test_recall value: [0.87272727 0.89090909 0.87272727 0.90909091 0.94642857 0.91071429 0.92857143 0.92857143 0.90909091 0.89090909] mean value: 0.905974025974026 key: train_recall value: [0.92785571 0.93186373 0.95190381 0.92985972 0.93975904 0.92971888 0.92971888 0.93373494 0.93386774 0.9238477 ] mean value: 0.9332130123701218 key: test_roc_auc value: [0.83814935 0.88295455 0.88279221 0.8474026 0.90048701 0.83717532 0.88246753 0.85519481 0.87272727 0.89090909] mean value: 0.869025974025974 key: train_roc_auc value: [0.89665878 0.90267885 0.90466676 0.89766078 0.90575126 0.89972918 0.89872717 0.89572519 0.89579158 0.89178357] mean value: 0.89891731253672 key: test_jcc value: [0.72727273 0.79032258 0.78688525 0.74626866 0.828125 0.73913043 0.8 0.76470588 0.78125 0.80327869] mean value: 0.7767239216196086 key: train_jcc value: [0.8180212 0.82740214 0.83333333 0.81978799 0.83274021 0.82238011 0.82092199 0.8172232 0.81754386 0.81019332] mean value: 0.8219547341614492 MCC on Blind test: 0.59 Accuracy on Blind test: 0.82 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04029536 0.04114866 0.03460741 0.02869368 0.06265116 0.03562665 0.03452373 0.04251862 0.0342226 0.02918482] mean value: 0.038347268104553224 key: score_time value: [0.01190257 0.01195717 0.01280403 0.01180339 0.01205182 0.01489973 0.01278424 0.01291847 0.01207256 0.01189375] mean value: 0.012508773803710937 key: test_mcc value: [0.95227002 0.80907152 0.7098505 0.75714286 0.76500781 0.70714286 0.8047619 0.67700771 0.40824829 0.8510645 ] mean value: 0.7441567958533553 key: train_mcc value: [0.82637697 0.80399267 0.82045348 0.80983264 0.82641807 0.8095776 0.8204801 0.84259028 0.8804868 0.80453796] mean value: 0.8244746575544734 key: test_accuracy value: [0.97560976 0.90243902 0.85365854 0.87804878 0.87804878 0.85365854 0.90243902 0.82926829 0.7 0.925 ] mean value: 0.8698170731707318 key: train_accuracy value: [0.91280654 0.90190736 0.91008174 0.90463215 0.91280654 0.90463215 0.91008174 0.92098093 0.94021739 0.90217391] mean value: 0.9120320459661178 key: test_fscore value: [0.97435897 0.9047619 0.84210526 0.87804878 0.87179487 0.85714286 0.9047619 0.85106383 0.72727273 0.92682927] mean value: 0.8738140381818856 key: train_fscore value: [0.91489362 0.90322581 0.91152815 0.90666667 0.9144385 0.90566038 0.91105121 0.92225201 0.94054054 0.90322581] mean value: 0.9133482690959911 key: test_precision value: [1. 0.86363636 0.88888889 0.85714286 0.94444444 0.85714286 0.9047619 0.76923077 0.66666667 0.9047619 ] mean value: 0.8656676656676656 key: train_precision value: [0.89583333 0.89361702 0.8994709 0.89005236 0.89528796 0.89361702 0.89893617 0.90526316 0.93548387 0.89361702] mean value: 0.900117880984539 key: test_recall value: [0.95 0.95 0.8 0.9 0.80952381 0.85714286 0.9047619 0.95238095 0.8 0.95 ] mean value: 0.8873809523809524 key: train_recall value: [0.93478261 0.91304348 0.92391304 0.92391304 0.93442623 0.91803279 0.92349727 0.93989071 0.94565217 0.91304348] mean value: 0.9270194820622476 key: test_roc_auc value: [0.975 0.90357143 0.85238095 0.87857143 0.8797619 0.85357143 0.90238095 0.82619048 0.7 0.925 ] mean value: 0.8696428571428572 key: train_roc_auc value: [0.9127465 0.90187693 0.91004395 0.90457947 0.91286529 0.90466857 0.9101182 0.92103231 0.94021739 0.90217391] mean value: 0.9120322523164647 key: test_jcc value: [0.95 0.82608696 0.72727273 0.7826087 0.77272727 0.75 0.82608696 0.74074074 0.57142857 0.86363636] mean value: 0.7810588284501327 key: train_jcc value: [0.84313725 0.82352941 0.83743842 0.82926829 0.84236453 0.82758621 0.83663366 0.85572139 0.8877551 0.82352941] mean value: 0.8406963692117855 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.78683686 0.84267449 0.76527405 0.79202271 0.88664556 0.84797692 0.83675766 0.7930088 0.73645806 0.82136631] mean value: 0.8109021425247193 key: score_time value: [0.01321769 0.01195598 0.01218224 0.01203346 0.01193094 0.0118413 0.0120554 0.01197004 0.01200318 0.01233721] mean value: 0.012152743339538575 key: test_mcc value: [0.90238095 0.90692382 0.66432098 0.76500781 0.76500781 0.61152662 0.8047619 0.7197263 0.40824829 0.95118973] mean value: 0.7499094216289688 key: train_mcc value: [0.84255766 0.78218766 0.84206619 0.78774111 0.84228511 0.77715388 0.82581835 0.86440241 0.82628223 0.82652651] mean value: 0.8217021104343651 key: test_accuracy value: [0.95121951 0.95121951 0.82926829 0.87804878 0.87804878 0.80487805 0.90243902 0.85365854 0.7 0.975 ] mean value: 0.8723780487804877 key: train_accuracy value: [0.92098093 0.89100817 0.92098093 0.89373297 0.92098093 0.88828338 0.91280654 0.93188011 0.91304348 0.91304348] mean value: 0.9106740907475418 key: test_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.95 0.95238095 0.81081081 0.88372093 0.87179487 0.81818182 0.9047619 0.86956522 0.72727273 0.97560976] mean value: 0.8764098988924509 key: train_fscore value: [0.92266667 0.89247312 0.92183288 0.89544236 0.92183288 0.89008043 0.91351351 0.93297587 0.91397849 0.9144385 ] mean value: 0.9119234723468699 key: test_precision value: [0.95 0.90909091 0.88235294 0.82608696 0.94444444 0.7826087 0.9047619 0.8 0.66666667 0.95238095] mean value: 0.861839347069526 key: train_precision value: [0.90575916 0.88297872 0.9144385 0.88359788 0.90957447 0.87368421 0.90374332 0.91578947 0.90425532 0.9 ] mean value: 0.8993821058932191 key: test_recall value: [0.95 1. 0.75 0.95 0.80952381 0.85714286 0.9047619 0.95238095 0.8 1. ] mean value: 0.8973809523809524 key: train_recall value: [0.94021739 0.90217391 0.92934783 0.9076087 0.93442623 0.90710383 0.92349727 0.95081967 0.92391304 0.92934783] mean value: 0.9248455690187694 key: test_roc_auc value: [0.95119048 0.95238095 0.82738095 0.8797619 0.8797619 0.80357143 0.90238095 0.85119048 0.7 0.975 ] mean value: 0.8722619047619047 key: train_roc_auc value: [0.92092837 0.89097767 0.92095807 0.89369506 0.92101746 0.88833452 0.91283559 0.93193158 0.91304348 0.91304348] mean value: 0.910676526490853 key: test_jcc value: [0.9047619 0.90909091 0.68181818 0.79166667 0.77272727 0.69230769 0.82608696 0.76923077 0.57142857 0.95238095] mean value: 0.7871499876934659 key: train_jcc value: [0.85643564 0.80582524 0.855 0.81067961 0.855 0.80193237 0.84079602 0.87437186 0.84158416 0.84236453] mean value: 0.8383989434715573 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01398969 0.01157308 0.00995636 0.00958562 0.00952148 0.00955033 0.00955057 0.00975442 0.01003146 0.00946712] mean value: 0.010298013687133789 key: score_time value: [0.01204395 0.00927138 0.00908518 0.00879693 0.0088644 0.00872564 0.00873661 0.00880122 0.0087173 0.00862312] mean value: 0.009166574478149414 key: test_mcc value: [0.66432098 0.47003614 0.80817439 0.51190476 0.53864117 0.51551459 0.49692935 0.4373371 0.10206207 0.80403025] mean value: 0.5348950810033143 key: train_mcc value: [0.53255474 0.53298192 0.57267693 0.58005934 0.55338634 0.55388717 0.57083084 0.55875455 0.62088704 0.56794151] mean value: 0.5643960359167794 key: test_accuracy value: [0.82926829 0.73170732 0.90243902 0.75609756 0.75609756 0.75609756 0.73170732 0.70731707 0.55 0.9 ] mean value: 0.7620731707317073 key: train_accuracy value: [0.76566757 0.76566757 0.78474114 0.78746594 0.77656676 0.77656676 0.78474114 0.77929155 0.80978261 0.7826087 ] mean value: 0.7813099751214311 key: test_fscore value: [0.81081081 0.74418605 0.89473684 0.75 0.72222222 0.75 0.68571429 0.66666667 0.59090909 0.9047619 ] mean value: 0.7520007869701872 key: train_fscore value: [0.75842697 0.75706215 0.77363897 0.77325581 0.77222222 0.76966292 0.77620397 0.77562327 0.81578947 0.77142857] mean value: 0.77433143190067 key: test_precision value: [0.88235294 0.69565217 0.94444444 0.75 0.86666667 0.78947368 0.85714286 0.8 0.54166667 0.86363636] mean value: 0.7991035797857039 key: train_precision value: [0.78488372 0.78823529 0.81818182 0.83125 0.78531073 0.79190751 0.80588235 0.78651685 0.79081633 0.81325301] mean value: 0.7996237627596408 key: test_recall value: [0.75 0.8 0.85 0.75 0.61904762 0.71428571 0.57142857 0.57142857 0.65 0.95 ] mean value: 0.7226190476190476 key: train_recall value: [0.73369565 0.72826087 0.73369565 0.72282609 0.75956284 0.74863388 0.74863388 0.76502732 0.8423913 0.73369565] mean value: 0.7516423140888572 key: test_roc_auc value: [0.82738095 0.73333333 0.90119048 0.75595238 0.75952381 0.75714286 0.73571429 0.71071429 0.55 0.9 ] mean value: 0.7630952380952382 key: train_roc_auc value: [0.76575493 0.76576978 0.78488061 0.78764255 0.77652055 0.77649085 0.78464303 0.77925279 0.80978261 0.7826087 ] mean value: 0.7813346400570207 key: test_jcc value: [0.68181818 0.59259259 0.80952381 0.6 0.56521739 0.6 0.52173913 0.5 0.41935484 0.82608696] mean value: 0.611633290090513 key: train_jcc value: [0.61085973 0.60909091 0.63084112 0.63033175 0.62895928 0.62557078 0.63425926 0.63348416 0.68888889 0.62790698] mean value: 0.6320192852709595 MCC on Blind test: 0.43 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01049089 0.01060438 0.01087594 0.01089048 0.01076055 0.01004434 0.01068139 0.01085758 0.0104301 0.01090169] mean value: 0.01065373420715332 key: score_time value: [0.00968719 0.00909781 0.00950837 0.0095396 0.00881219 0.00950313 0.00953436 0.00951672 0.00924182 0.00943446] mean value: 0.009387564659118653 key: test_mcc value: [0.7633652 0.75714286 0.7633652 0.56086079 0.49692935 0.6133669 0.71121921 0.56086079 0. 0.7 ] mean value: 0.5927110280741841 key: train_mcc value: [0.68990534 0.61522034 0.71127744 0.70268762 0.67492212 0.70148323 0.69678894 0.69625716 0.70409854 0.69104454] mean value: 0.6883685277058793 key: test_accuracy value: [0.87804878 0.87804878 0.87804878 0.7804878 0.73170732 0.80487805 0.85365854 0.7804878 0.5 0.85 ] mean value: 0.7935365853658537 key: train_accuracy value: [0.84468665 0.80653951 0.85558583 0.85013624 0.83651226 0.85013624 0.84741144 0.84741144 0.85054348 0.8451087 ] mean value: 0.8434071792441654 key: test_fscore value: [0.86486486 0.87804878 0.86486486 0.76923077 0.68571429 0.8 0.85 0.79069767 0.47368421 0.85 ] mean value: 0.782710545010751 key: train_fscore value: [0.84210526 0.79886686 0.85479452 0.84419263 0.82954545 0.84507042 0.84090909 0.84180791 0.84330484 0.84122563] mean value: 0.8381822621430892 key: test_precision value: [0.94117647 0.85714286 0.94117647 0.78947368 0.85714286 0.84210526 0.89473684 0.77272727 0.5 0.85 ] mean value: 0.8245681717663141 key: train_precision value: [0.85875706 0.83431953 0.86187845 0.8816568 0.86390533 0.87209302 0.87573964 0.87134503 0.88622754 0.86285714] mean value: 0.8668779557223617 key: test_recall value: [0.8 0.9 0.8 0.75 0.57142857 0.76190476 0.80952381 0.80952381 0.45 0.85 ] mean value: 0.7502380952380953 key: train_recall value: [0.82608696 0.76630435 0.84782609 0.80978261 0.79781421 0.81967213 0.80874317 0.81420765 0.80434783 0.82065217] mean value: 0.8115437158469946 key: test_roc_auc value: [0.87619048 0.87857143 0.87619048 0.7797619 0.73571429 0.80595238 0.8547619 0.7797619 0.5 0.85 ] mean value: 0.7936904761904762 key: train_roc_auc value: [0.84473747 0.80664944 0.85560703 0.8502465 0.8364071 0.85005346 0.84730637 0.84732122 0.85054348 0.8451087 ] mean value: 0.8433980755523878 key: test_jcc value: [0.76190476 0.7826087 0.76190476 0.625 0.52173913 0.66666667 0.73913043 0.65384615 0.31034483 0.73913043] mean value: 0.6562275867560725 key: train_jcc value: [0.72727273 0.66509434 0.74641148 0.73039216 0.70873786 0.73170732 0.7254902 0.72682927 0.72906404 0.72596154] mean value: 0.7216960930404063 MCC on Blind test: 0.54 Accuracy on Blind test: 0.8 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01083994 0.01040053 0.01038337 0.00928807 0.01022983 0.01036191 0.01036453 0.01052999 0.0103054 0.00929523] mean value: 0.010199880599975586 key: score_time value: [0.01751852 0.01245236 0.01286888 0.01144576 0.01214862 0.0126729 0.01281142 0.01234031 0.01376772 0.01659131] mean value: 0.013461780548095704 key: test_mcc value: [ 0.47003614 0.56836003 0.2681441 0.41487884 0.17506448 0.46428571 0.51551459 0.31960727 -0.05057217 0.60302269] mean value: 0.3748341704030617 key: train_mcc value: [0.60767426 0.57504013 0.5924094 0.6304166 0.62959664 0.59166588 0.57083084 0.61348256 0.63047203 0.60873161] mean value: 0.6050319937989842 key: test_accuracy value: [0.73170732 0.7804878 0.63414634 0.70731707 0.58536585 0.73170732 0.75609756 0.65853659 0.475 0.8 ] mean value: 0.6860365853658537 key: train_accuracy value: [0.80381471 0.78746594 0.79564033 0.8147139 0.8147139 0.79564033 0.78474114 0.80653951 0.81521739 0.80434783] mean value: 0.8022834972159697 key: test_fscore value: [0.74418605 0.79069767 0.59459459 0.68421053 0.56410256 0.73170732 0.75 0.69565217 0.43243243 0.78947368] mean value: 0.6777057013572354 key: train_fscore value: [0.80327869 0.79032258 0.78991597 0.81005587 0.81621622 0.79108635 0.77620397 0.80222841 0.81420765 0.80540541] mean value: 0.7998921102609804 key: test_precision value: [0.69565217 0.73913043 0.64705882 0.72222222 0.61111111 0.75 0.78947368 0.64 0.47058824 0.83333333] mean value: 0.6898570018396375 key: train_precision value: [0.80769231 0.78191489 0.8150289 0.83333333 0.80748663 0.80681818 0.80588235 0.81818182 0.81868132 0.80107527] mean value: 0.8096095007832509 key: test_recall value: [0.8 0.85 0.55 0.65 0.52380952 0.71428571 0.71428571 0.76190476 0.4 0.75 ] mean value: 0.6714285714285715 key: train_recall value: [0.79891304 0.79891304 0.76630435 0.78804348 0.82513661 0.77595628 0.74863388 0.78688525 0.80978261 0.80978261] mean value: 0.7908351152292706 key: test_roc_auc value: [0.73333333 0.78214286 0.63214286 0.70595238 0.58690476 0.73214286 0.75714286 0.65595238 0.475 0.8 ] mean value: 0.6860714285714287 key: train_roc_auc value: [0.80382811 0.78743466 0.79572048 0.81478677 0.81474222 0.79558684 0.78464303 0.8064861 0.81521739 0.80434783] mean value: 0.8022793418864338 key: test_jcc value: [0.59259259 0.65384615 0.42307692 0.52 0.39285714 0.57692308 0.6 0.53333333 0.27586207 0.65217391] mean value: 0.5220665204638218 key: train_jcc value: [0.67123288 0.65333333 0.65277778 0.68075117 0.68949772 0.65437788 0.63425926 0.66976744 0.68663594 0.67420814] mean value: 0.6666841549228235 MCC on Blind test: 0.4 Accuracy on Blind test: 0.74 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01670408 0.01670146 0.01674366 0.01667166 0.01650667 0.01689339 0.01668477 0.01658773 0.01667404 0.01697803] mean value: 0.01671454906463623 key: score_time value: [0.01076055 0.01121688 0.01057029 0.0105505 0.01071262 0.01083326 0.01067638 0.01060128 0.01066852 0.01076341] mean value: 0.010735368728637696 key: test_mcc value: [0.90238095 0.95238095 0.8047619 0.70714286 0.71121921 0.65871309 0.75714286 0.67700771 0.20100756 0.8510645 ] mean value: 0.7222821594970632 key: train_mcc value: [0.73381575 0.75529095 0.76091136 0.77150768 0.77800027 0.77131232 0.75506509 0.77667885 0.80989026 0.75070989] mean value: 0.7663182423175936 key: test_accuracy value: [0.95121951 0.97560976 0.90243902 0.85365854 0.85365854 0.82926829 0.87804878 0.82926829 0.6 0.925 ] mean value: 0.8598170731707317 key: train_accuracy value: [0.86648501 0.8773842 0.88010899 0.88555858 0.88828338 0.88555858 0.8773842 0.88828338 0.9048913 0.875 ] mean value: 0.8828937625873712 key: test_fscore value: [0.95 0.97560976 0.9 0.85 0.85 0.8372093 0.87804878 0.85106383 0.61904762 0.92682927] mean value: 0.8637808556038483 /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( key: train_fscore value: [0.87002653 0.88 0.88297872 0.88770053 0.89124668 0.88648649 0.8787062 0.88888889 0.90566038 0.87765957] mean value: 0.8849353994375553 key: test_precision value: [0.95 0.95238095 0.9 0.85 0.89473684 0.81818182 0.9 0.76923077 0.59090909 0.9047619 ] mean value: 0.8530201377569798 key: train_precision value: [0.84974093 0.86387435 0.86458333 0.87368421 0.86597938 0.87700535 0.86702128 0.88172043 0.89839572 0.859375 ] mean value: 0.8701379979717162 key: test_recall value: [0.95 1. 0.9 0.85 0.80952381 0.85714286 0.85714286 0.95238095 0.65 0.95 ] mean value: 0.8776190476190476 key: train_recall value: [0.89130435 0.89673913 0.90217391 0.90217391 0.91803279 0.89617486 0.89071038 0.89617486 0.91304348 0.89673913] mean value: 0.9003266809218342 key: test_roc_auc value: [0.95119048 0.97619048 0.90238095 0.85357143 0.8547619 0.82857143 0.87857143 0.82619048 0.6 0.925 ] mean value: 0.8596428571428572 key: train_roc_auc value: [0.8664172 0.87733131 0.88004871 0.88551319 0.88836422 0.88558743 0.87742041 0.88830482 0.9048913 0.875 ] mean value: 0.882887859349014 key: test_jcc value: [0.9047619 0.95238095 0.81818182 0.73913043 0.73913043 0.72 0.7826087 0.74074074 0.44827586 0.86363636] mean value: 0.7708847206988136 key: train_jcc value: [0.76995305 0.78571429 0.79047619 0.79807692 0.80382775 0.7961165 0.78365385 0.8 0.82758621 0.78199052] mean value: 0.7937395281338545 MCC on Blind test: 0.62 Accuracy on Blind test: 0.83 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.29302669 1.45275974 1.41800356 1.26620889 1.44646573 1.27663016 1.4107101 1.35374713 1.28098321 1.41262245] mean value: 1.3611157655715942 key: score_time value: [0.01493073 0.01485372 0.01487899 0.01501226 0.01526451 0.01565886 0.01561046 0.01576424 0.01891017 0.01555133] mean value: 0.01564352512359619 key: test_mcc value: [0.75714286 0.76500781 0.70714286 0.85441771 0.66668392 0.6133669 0.71121921 0.57570364 0.464758 0.8510645 ] mean value: 0.6966507394206809 key: train_mcc value: [0.99456506 0.98366595 0.98910074 0.98366595 0.98366547 0.98366547 0.98366547 1. 0.98913043 0.98913043] mean value: 0.9880254978024888 key: test_accuracy value: [0.87804878 0.87804878 0.85365854 0.92682927 0.82926829 0.80487805 0.85365854 0.7804878 0.725 0.925 ] mean value: 0.8454878048780488 key: train_accuracy value: [0.9972752 0.99182561 0.99455041 0.99182561 0.99182561 0.99182561 0.99182561 1. 0.99456522 0.99456522] mean value: 0.9940084113256723 key: test_fscore value: [0.87804878 0.88372093 0.85 0.92307692 0.82051282 0.8 0.85 0.80851064 0.75555556 0.92682927] mean value: 0.8496254916456217 key: train_fscore value: [0.99728997 0.99182561 0.99456522 0.99182561 0.99178082 0.99178082 0.99178082 1. 0.99456522 0.99456522] mean value: 0.9939979316985105 key: test_precision value: [0.85714286 0.82608696 0.85 0.94736842 0.88888889 0.84210526 0.89473684 0.73076923 0.68 0.9047619 ] mean value: 0.842186036440041 key: train_precision value: [0.99459459 0.99453552 0.99456522 0.99453552 0.99450549 0.99450549 0.99450549 1. 0.99456522 0.99456522] mean value: 0.9950877768536357 key: test_recall value: [0.9 0.95 0.85 0.9 0.76190476 0.76190476 0.80952381 0.9047619 0.85 0.95 ] mean value: 0.8638095238095238 key: train_recall value: [1. 0.98913043 0.99456522 0.98913043 0.98907104 0.98907104 0.98907104 1. 0.99456522 0.99456522] mean value: 0.9929169636493229 key: test_roc_auc value: [0.87857143 0.8797619 0.85357143 0.92619048 0.83095238 0.80595238 0.8547619 0.77738095 0.725 0.925 ] mean value: 0.8457142857142858 key: train_roc_auc value: [0.99726776 0.99183298 0.99455037 0.99183298 0.99181813 0.99181813 0.99181813 1. 0.99456522 0.99456522] mean value: 0.9940068899976241 key: test_jcc value: [0.7826087 0.79166667 0.73913043 0.85714286 0.69565217 0.66666667 0.73913043 0.67857143 0.60714286 0.86363636] mean value: 0.7421348578957274 key: train_jcc value: [0.99459459 0.98378378 0.98918919 0.98378378 0.98369565 0.98369565 0.98369565 1. 0.98918919 0.98918919] mean value: 0.9880816686251469 MCC on Blind test: 0.63 Accuracy on Blind test: 0.84 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03136277 0.02270293 0.02259898 0.02231073 0.02122474 0.02493143 0.02263188 0.02233434 0.02839565 0.02085257] mean value: 0.023934602737426758 key: score_time value: [0.01207113 0.00898671 0.00876594 0.00866342 0.00875425 0.01139832 0.0089736 0.00955963 0.00888205 0.00883341] mean value: 0.009488844871520996 key: test_mcc value: [1. 0.85441771 0.7098505 0.90238095 0.8047619 0.80817439 0.90649828 0.90649828 0.65743826 0.70352647] mean value: 0.8253546749346465 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.92682927 0.85365854 0.95121951 0.90243902 0.90243902 0.95121951 0.95121951 0.825 0.85 ] mean value: 0.9114024390243902 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.92307692 0.84210526 0.95 0.9047619 0.90909091 0.95454545 0.95454545 0.8372093 0.85714286] mean value: 0.913247806864698 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.94736842 0.88888889 0.95 0.9047619 0.86956522 0.91304348 0.91304348 0.7826087 0.81818182] mean value: 0.8987461902450461 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.9 0.8 0.95 0.9047619 0.95238095 1. 1. 0.9 0.9 ] mean value: 0.9307142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.92619048 0.85238095 0.95119048 0.90238095 0.90119048 0.95 0.95 0.825 0.85 ] mean value: 0.9108333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.85714286 0.72727273 0.9047619 0.82608696 0.83333333 0.91304348 0.91304348 0.72 0.75 ] mean value: 0.8444684735554301 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.69 Accuracy on Blind test: 0.87 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.11634302 0.11963344 0.12352252 0.12230539 0.11753988 0.12038803 0.12010169 0.12309289 0.11572337 0.12082148] mean value: 0.11994717121124268 key: score_time value: [0.01868987 0.01897264 0.01911902 0.01916981 0.019449 0.01901913 0.01787734 0.01967764 0.01787829 0.01886177] mean value: 0.018871450424194337 key: test_mcc value: [0.85441771 0.85441771 0.80817439 0.70714286 0.76500781 0.80817439 0.8047619 0.57570364 0.40824829 0.75858261] mean value: 0.7344631301111129 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.92682927 0.92682927 0.90243902 0.85365854 0.87804878 0.90243902 0.90243902 0.7804878 0.7 0.875 ] mean value: 0.8648170731707318 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92307692 0.92307692 0.89473684 0.85 0.87179487 0.90909091 0.9047619 0.80851064 0.72727273 0.88372093] mean value: 0.8696042669709952 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94736842 0.94736842 0.94444444 0.85 0.94444444 0.86956522 0.9047619 0.73076923 0.66666667 0.82608696] mean value: 0.8631475707104997 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9 0.9 0.85 0.85 0.80952381 0.95238095 0.9047619 0.9047619 0.8 0.95 ] mean value: 0.8821428571428571 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92619048 0.92619048 0.90119048 0.85357143 0.8797619 0.90119048 0.90238095 0.77738095 0.7 0.875 ] mean value: 0.8642857142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.85714286 0.85714286 0.80952381 0.73913043 0.77272727 0.83333333 0.82608696 0.67857143 0.57142857 0.79166667] mean value: 0.7736754187841144 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.64 Accuracy on Blind test: 0.84 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01135373 0.0111177 0.01112485 0.01120734 0.01128149 0.0112555 0.01124287 0.01118231 0.01134777 0.01073694] mean value: 0.011185050010681152 key: score_time value: [0.0094831 0.00955534 0.0096519 0.0088768 0.00959039 0.00959277 0.00955296 0.00955844 0.00960636 0.00904131] mean value: 0.009450936317443847 key: test_mcc value: [0.51320273 0.36666667 0.51190476 0.51966679 0.41487884 0.61152662 0.65952381 0.57570364 0.25031309 0.45056356] mean value: 0.4873950495640369 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75609756 0.68292683 0.75609756 0.75609756 0.70731707 0.80487805 0.82926829 0.7804878 0.625 0.725 ] mean value: 0.7423170731707317 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73684211 0.68292683 0.75 0.72222222 0.72727273 0.81818182 0.82926829 0.80851064 0.61538462 0.71794872] mean value: 0.7408557966522351 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.66666667 0.75 0.8125 0.69565217 0.7826087 0.85 0.73076923 0.63157895 0.73684211] mean value: 0.7434395597410471 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.7 0.7 0.75 0.65 0.76190476 0.85714286 0.80952381 0.9047619 0.6 0.7 ] mean value: 0.7433333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.7547619 0.68333333 0.75595238 0.75357143 0.70595238 0.80357143 0.8297619 0.77738095 0.625 0.725 ] mean value: 0.7414285714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58333333 0.51851852 0.6 0.56521739 0.57142857 0.69230769 0.70833333 0.67857143 0.44444444 0.56 ] mean value: 0.592215471324167 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.41 Accuracy on Blind test: 0.7 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.70994759 1.66301012 1.64955759 1.68670607 1.65594816 1.65043163 1.63158131 1.61386442 1.59506416 1.65376139] mean value: 1.6509872436523438 key: score_time value: [0.09827518 0.09878492 0.09821153 0.091434 0.09107804 0.09389925 0.09783268 0.09119153 0.09594035 0.09461141] mean value: 0.09512588977813721 key: test_mcc value: [0.90238095 0.90238095 0.90238095 0.85441771 0.80907152 0.80817439 0.95227002 0.7633652 0.55629391 0.75858261] mean value: 0.8209318202983664 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95121951 0.95121951 0.95121951 0.92682927 0.90243902 0.90243902 0.97560976 0.87804878 0.775 0.875 ] mean value: 0.9089024390243903 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95 0.95 0.95 0.92307692 0.9 0.90909091 0.97674419 0.88888889 0.79069767 0.88372093] mean value: 0.9122219511754396 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95 0.95 0.95 0.94736842 0.94736842 0.86956522 0.95454545 0.83333333 0.73913043 0.82608696] mean value: 0.8967398238679702 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.95 0.95 0.95 0.9 0.85714286 0.95238095 1. 0.95238095 0.85 0.95 ] mean value: 0.9311904761904761 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95119048 0.95119048 0.95119048 0.92619048 0.90357143 0.90119048 0.975 0.87619048 0.775 0.875 ] mean value: 0.9085714285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.9047619 0.9047619 0.9047619 0.85714286 0.81818182 0.83333333 0.95454545 0.8 0.65384615 0.79166667] mean value: 0.8423001998001998 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.76 Accuracy on Blind test: 0.9 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.9235487 0.93965459 0.92365193 1.00056052 0.96404266 0.9338429 0.99391007 0.94137692 0.92965555 0.90762067] mean value: 0.945786452293396 key: score_time value: [0.24140787 0.21806431 0.22907925 0.201998 0.23001051 0.21210074 0.33192515 0.26397443 0.2779243 0.21721077] mean value: 0.24236953258514404 key: test_mcc value: [0.90238095 0.95238095 0.90238095 0.85441771 0.76500781 0.80817439 0.95227002 0.7633652 0.45056356 0.75858261] mean value: 0.8109524139209209 key: train_mcc value: [0.91859058 0.91859058 0.92927584 0.92392019 0.91860262 0.913028 0.90212679 0.92928213 0.94023128 0.9135293 ] mean value: 0.9207177296240489 key: test_accuracy value: [0.95121951 0.97560976 0.95121951 0.92682927 0.87804878 0.90243902 0.97560976 0.87804878 0.725 0.875 ] mean value: 0.9039024390243903 key: train_accuracy value: [0.95912807 0.95912807 0.96457766 0.96185286 0.95912807 0.95640327 0.95095368 0.96457766 0.9701087 0.95652174] mean value: 0.9602379753583699 key: test_fscore value: [0.95 0.97560976 0.95 0.92307692 0.87179487 0.90909091 0.97674419 0.88888889 0.73170732 0.88372093] mean value: 0.9060633782301395 key: train_fscore value: [0.95978552 0.95978552 0.96495957 0.96236559 0.95956873 0.95675676 0.95135135 0.96476965 0.9701897 0.95721925] mean value: 0.9606751647899552 key: test_precision value: [0.95 0.95238095 0.95 0.94736842 0.94444444 0.86956522 0.95454545 0.83333333 0.71428571 0.82608696] mean value: 0.8942010493955574 key: train_precision value: [0.94708995 0.94708995 0.95721925 0.95212766 0.94680851 0.94652406 0.94117647 0.95698925 0.96756757 0.94210526] mean value: 0.9504697928526207 key: test_recall value: [0.95 1. 0.95 0.9 0.80952381 0.95238095 1. 0.95238095 0.75 0.95 ] mean value: 0.9214285714285714 key: train_recall value: [0.97282609 0.97282609 0.97282609 0.97282609 0.9726776 0.96721311 0.96174863 0.9726776 0.97282609 0.97282609] mean value: 0.971127346162984 key: test_roc_auc value: [0.95119048 0.97619048 0.95119048 0.92619048 0.8797619 0.90119048 0.975 0.87619048 0.725 0.875 ] mean value: 0.9036904761904762 key: train_roc_auc value: [0.95909064 0.95909064 0.96455512 0.96182288 0.95916488 0.95643264 0.95098301 0.96459967 0.9701087 0.95652174] mean value: 0.9602369921596579 key: test_jcc value: [0.9047619 0.95238095 0.9047619 0.85714286 0.77272727 0.83333333 0.95454545 0.8 0.57692308 0.79166667] mean value: 0.8348243423243423 key: train_jcc value: [0.92268041 0.92268041 0.93229167 0.92746114 0.92227979 0.91709845 0.90721649 0.93193717 0.94210526 0.91794872] mean value: 0.9243699518374119 MCC on Blind test: 0.78 Accuracy on Blind test: 0.91 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02452135 0.01085424 0.01013875 0.01106048 0.01107907 0.01072407 0.01107764 0.01099753 0.01002288 0.01183772] mean value: 0.01223137378692627 key: score_time value: [0.01152945 0.00886941 0.00989628 0.00964642 0.00978494 0.00939059 0.00967145 0.00969195 0.00964117 0.01014018] mean value: 0.009826183319091797 key: test_mcc value: [0.7633652 0.75714286 0.7633652 0.56086079 0.49692935 0.6133669 0.71121921 0.56086079 0. 0.7 ] mean value: 0.5927110280741841 key: train_mcc value: [0.68990534 0.61522034 0.71127744 0.70268762 0.67492212 0.70148323 0.69678894 0.69625716 0.70409854 0.69104454] mean value: 0.6883685277058793 key: test_accuracy value: [0.87804878 0.87804878 0.87804878 0.7804878 0.73170732 0.80487805 0.85365854 0.7804878 0.5 0.85 ] mean value: 0.7935365853658537 key: train_accuracy value: [0.84468665 0.80653951 0.85558583 0.85013624 0.83651226 0.85013624 0.84741144 0.84741144 0.85054348 0.8451087 ] mean value: 0.8434071792441654 key: test_fscore value: [0.86486486 0.87804878 0.86486486 0.76923077 0.68571429 0.8 0.85 0.79069767 0.47368421 0.85 ] mean value: 0.782710545010751 key: train_fscore value: [0.84210526 0.79886686 0.85479452 0.84419263 0.82954545 0.84507042 0.84090909 0.84180791 0.84330484 0.84122563] mean value: 0.8381822621430892 key: test_precision value: [0.94117647 0.85714286 0.94117647 0.78947368 0.85714286 0.84210526 0.89473684 0.77272727 0.5 0.85 ] mean value: 0.8245681717663141 key: train_precision value: [0.85875706 0.83431953 0.86187845 0.8816568 0.86390533 0.87209302 0.87573964 0.87134503 0.88622754 0.86285714] mean value: 0.8668779557223617 key: test_recall value: [0.8 0.9 0.8 0.75 0.57142857 0.76190476 0.80952381 0.80952381 0.45 0.85 ] mean value: 0.7502380952380953 key: train_recall value: [0.82608696 0.76630435 0.84782609 0.80978261 0.79781421 0.81967213 0.80874317 0.81420765 0.80434783 0.82065217] mean value: 0.8115437158469946 key: test_roc_auc value: [0.87619048 0.87857143 0.87619048 0.7797619 0.73571429 0.80595238 0.8547619 0.7797619 0.5 0.85 ] mean value: 0.7936904761904762 key: train_roc_auc value: [0.84473747 0.80664944 0.85560703 0.8502465 0.8364071 0.85005346 0.84730637 0.84732122 0.85054348 0.8451087 ] mean value: 0.8433980755523878 key: test_jcc value: [0.76190476 0.7826087 0.76190476 0.625 0.52173913 0.66666667 0.73913043 0.65384615 0.31034483 0.73913043] mean value: 0.6562275867560725 key: train_jcc value: [0.72727273 0.66509434 0.74641148 0.73039216 0.70873786 0.73170732 0.7254902 0.72682927 0.72906404 0.72596154] mean value: 0.7216960930404063 MCC on Blind test: 0.54 Accuracy on Blind test: 0.8 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.07918167 0.07073474 0.07140946 0.07169986 0.06777453 0.06753397 0.06917024 0.07282233 0.23122311 0.0630393 ] mean value: 0.08645892143249512 key: score_time value: [0.01080298 0.01173496 0.01048946 0.0110333 0.01068783 0.01041865 0.01061344 0.01113892 0.01103044 0.01089954] mean value: 0.010884952545166016 key: test_mcc value: [1. 0.95238095 0.8047619 0.90238095 0.8547619 0.90649828 0.95227002 0.86240942 0.75093926 0.8510645 ] mean value: 0.8837467182165777 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97560976 0.90243902 0.95121951 0.92682927 0.95121951 0.97560976 0.92682927 0.875 0.925 ] mean value: 0.9409756097560975 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.97560976 0.9 0.95 0.92682927 0.95454545 0.97674419 0.93333333 0.87804878 0.92682927] mean value: 0.9421940047096031 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95238095 0.9 0.95 0.95 0.91304348 0.95454545 0.875 0.85714286 0.9047619 ] mean value: 0.9256874647092038 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.9 0.95 0.9047619 1. 1. 1. 0.9 0.95 ] mean value: 0.9604761904761905 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.97619048 0.90238095 0.95119048 0.92738095 0.95 0.975 0.925 0.875 0.925 ] mean value: 0.9407142857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.95238095 0.81818182 0.9047619 0.86363636 0.91304348 0.95454545 0.875 0.7826087 0.86363636] mean value: 0.8927795031055901 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.78 Accuracy on Blind test: 0.91 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04169512 0.08199573 0.03944111 0.04006433 0.07459378 0.03755236 0.03834867 0.08876681 0.0664711 0.06832337] mean value: 0.05772523880004883 key: score_time value: [0.02241969 0.01253891 0.01246405 0.01248622 0.01242995 0.01240349 0.01240277 0.02495623 0.02107215 0.02245378] mean value: 0.016562724113464357 key: test_mcc value: [0.7565654 0.76500781 0.61969655 0.65871309 0.56190476 0.65871309 0.8047619 0.60952381 0.15171652 0.75858261] mean value: 0.6345185543993015 key: train_mcc value: [0.86920884 0.88556918 0.86942317 0.88016394 0.89657222 0.86966229 0.85830957 0.88567102 0.90217391 0.86961659] mean value: 0.8786370741686671 key: test_accuracy value: [0.87804878 0.87804878 0.80487805 0.82926829 0.7804878 0.82926829 0.90243902 0.80487805 0.575 0.875 ] mean value: 0.8157317073170731 key: train_accuracy value: [0.9346049 0.94277929 0.9346049 0.9400545 0.94822888 0.9346049 0.92915531 0.94277929 0.95108696 0.93478261] mean value: 0.9392681554318209 key: test_fscore value: [0.87179487 0.88372093 0.77777778 0.82051282 0.7804878 0.8372093 0.9047619 0.80952381 0.60465116 0.88372093] mean value: 0.8174161314830629 key: train_fscore value: [0.93478261 0.94308943 0.93406593 0.93989071 0.9476584 0.93333333 0.92896175 0.94214876 0.95108696 0.93442623] mean value: 0.9389444114569994 key: test_precision value: [0.89473684 0.82608696 0.875 0.84210526 0.8 0.81818182 0.9047619 0.80952381 0.56521739 0.82608696] mean value: 0.8161700942078517 key: train_precision value: [0.93478261 0.94054054 0.94444444 0.94505495 0.95555556 0.94915254 0.92896175 0.95 0.95108696 0.93956044] mean value: 0.9439139781380078 key: test_recall value: [0.85 0.95 0.7 0.8 0.76190476 0.85714286 0.9047619 0.80952381 0.65 0.95 ] mean value: 0.8233333333333334 key: train_recall value: [0.93478261 0.94565217 0.92391304 0.93478261 0.93989071 0.91803279 0.92896175 0.93442623 0.95108696 0.92934783] mean value: 0.9340876692801141 key: test_roc_auc value: [0.87738095 0.8797619 0.80238095 0.82857143 0.78095238 0.82857143 0.90238095 0.8047619 0.575 0.875 ] mean value: 0.8154761904761905 key: train_roc_auc value: [0.93460442 0.94277144 0.93463412 0.9400689 0.94820622 0.93455987 0.92915479 0.94275659 0.95108696 0.93478261] mean value: 0.9392625920646235 key: test_jcc value: [0.77272727 0.79166667 0.63636364 0.69565217 0.64 0.72 0.82608696 0.68 0.43333333 0.79166667] mean value: 0.6987496706192359 key: train_jcc value: [0.87755102 0.89230769 0.87628866 0.88659794 0.90052356 0.875 0.86734694 0.890625 0.90673575 0.87692308] mean value: 0.8849899637857348 MCC on Blind test: 0.55 Accuracy on Blind test: 0.8 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01761723 0.01124287 0.01097202 0.01078892 0.01084352 0.01088452 0.00967693 0.00969195 0.00969839 0.00965333] mean value: 0.01110696792602539 key: score_time value: [0.01886177 0.00984693 0.00961399 0.00942779 0.00946093 0.0087142 0.00870919 0.00880623 0.00862551 0.00872087] mean value: 0.010078740119934083 key: test_mcc value: [0.7197263 0.63994524 0.75714286 0.65871309 0.51551459 0.66668392 0.66668392 0.46428571 0. 0.80403025] mean value: 0.5892725898158476 key: train_mcc value: [0.63006799 0.5755263 0.65839315 0.69024299 0.61906921 0.61938983 0.63030694 0.62998014 0.67983997 0.6148664 ] mean value: 0.6347682927775092 key: test_accuracy value: [0.85365854 0.80487805 0.87804878 0.82926829 0.75609756 0.82926829 0.82926829 0.73170732 0.5 0.9 ] mean value: 0.791219512195122 key: train_accuracy value: [0.8147139 0.78746594 0.82833787 0.84468665 0.80926431 0.80926431 0.8147139 0.8147139 0.83967391 0.80706522] mean value: 0.8169899893377562 key: test_fscore value: [0.83333333 0.82608696 0.87804878 0.82051282 0.75 0.82051282 0.82051282 0.73170732 0.47368421 0.9047619 ] mean value: 0.7859160964242731 key: train_fscore value: [0.81111111 0.78333333 0.82253521 0.84122563 0.80446927 0.80337079 0.80898876 0.81005587 0.8365651 0.80222841] mean value: 0.8123883481888775 key: test_precision value: [0.9375 0.73076923 0.85714286 0.84210526 0.78947368 0.88888889 0.88888889 0.75 0.5 0.86363636] mean value: 0.8048405176694651 key: train_precision value: [0.82954545 0.80113636 0.85380117 0.86285714 0.82285714 0.8265896 0.83236994 0.82857143 0.85310734 0.82285714] mean value: 0.8333692727120341 key: test_recall value: [0.75 0.95 0.9 0.8 0.71428571 0.76190476 0.76190476 0.71428571 0.45 0.95 ] mean value: 0.7752380952380953 key: train_recall value: [0.79347826 0.76630435 0.79347826 0.82065217 0.78688525 0.78142077 0.78688525 0.79234973 0.82065217 0.7826087 ] mean value: 0.7924714896650036 key: test_roc_auc value: [0.85119048 0.80833333 0.87857143 0.82857143 0.75714286 0.83095238 0.83095238 0.73214286 0.5 0.9 ] mean value: 0.7917857142857143 key: train_roc_auc value: [0.81477192 0.78752376 0.82843312 0.84475232 0.80920349 0.80918864 0.81463828 0.81465312 0.83967391 0.80706522] mean value: 0.8169903777619387 key: test_jcc value: [0.71428571 0.7037037 0.7826087 0.69565217 0.6 0.69565217 0.69565217 0.57692308 0.31034483 0.82608696] mean value: 0.6600909496411745 key: train_jcc value: [0.68224299 0.64383562 0.69856459 0.72596154 0.6728972 0.6713615 0.67924528 0.68075117 0.71904762 0.66976744] mean value: 0.6843674955100508 MCC on Blind test: 0.55 Accuracy on Blind test: 0.8 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01511908 0.01789021 0.02156711 0.01820374 0.01853132 0.01843572 0.01728106 0.01646686 0.01864266 0.01955032] mean value: 0.01816880702972412 key: score_time value: [0.00898719 0.01122427 0.01118827 0.01151872 0.01170158 0.01176333 0.01167345 0.01163459 0.01162744 0.01157379] mean value: 0.011289262771606445 key: test_mcc value: [0.698212 0.7197263 0.46494781 0.46494781 0.72229808 0.75714286 0.66496381 0.61152662 0.40824829 0.8510645 ] mean value: 0.6363078090294618 key: train_mcc value: [0.65746895 0.78653727 0.5557906 0.53460389 0.83671955 0.84197553 0.6508894 0.84760268 0.89151503 0.74762767] mean value: 0.7350730570093046 key: test_accuracy value: [0.82926829 0.85365854 0.68292683 0.68292683 0.85365854 0.87804878 0.80487805 0.80487805 0.7 0.925 ] mean value: 0.8015243902439024 key: train_accuracy value: [0.80926431 0.88828338 0.73841962 0.72479564 0.91825613 0.92098093 0.80381471 0.92370572 0.94565217 0.86141304] mean value: 0.8534585653358607 key: test_fscore value: [0.78787879 0.83333333 0.51851852 0.51851852 0.84210526 0.87804878 0.76470588 0.81818182 0.72727273 0.92307692] mean value: 0.7611640552779267 key: train_fscore value: [0.77124183 0.87905605 0.64963504 0.62453532 0.91891892 0.92098093 0.76 0.92265193 0.94623656 0.8411215 ] mean value: 0.8234378063262462 key: test_precision value: [1. 0.9375 1. 1. 0.94117647 0.9 1. 0.7826087 0.66666667 0.94736842] mean value: 0.9175320253959708 key: train_precision value: [0.96721311 0.96129032 0.98888889 0.98823529 0.90909091 0.91847826 0.97435897 0.93296089 0.93617021 0.98540146] mean value: 0.9562088331135449 key: test_recall value: [0.65 0.75 0.35 0.35 0.76190476 0.85714286 0.61904762 0.85714286 0.8 0.9 ] mean value: 0.6895238095238095 key: train_recall value: [0.64130435 0.80978261 0.48369565 0.45652174 0.92896175 0.92349727 0.62295082 0.91256831 0.95652174 0.73369565] mean value: 0.7469499881206938 key: test_roc_auc value: [0.825 0.85119048 0.675 0.675 0.85595238 0.87857143 0.80952381 0.80357143 0.7 0.925 ] mean value: 0.7998809523809524 key: train_roc_auc value: [0.80972321 0.88849786 0.73911559 0.72552863 0.91828522 0.92098776 0.80332324 0.92367546 0.94565217 0.86141304] mean value: 0.853620218579235 key: test_jcc value: [0.65 0.71428571 0.35 0.35 0.72727273 0.7826087 0.61904762 0.69230769 0.57142857 0.85714286] mean value: 0.6314093877137356 key: train_jcc value: [0.62765957 0.78421053 0.48108108 0.45405405 0.85 0.85353535 0.61290323 0.85641026 0.89795918 0.72580645] mean value: 0.7143619706957444 MCC on Blind test: 0.63 Accuracy on Blind test: 0.83 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01723146 0.03948784 0.01812649 0.03602719 0.01928163 0.01920485 0.0195148 0.01716614 0.01644444 0.01767397] mean value: 0.022015881538391114 key: score_time value: [0.01169276 0.01263618 0.0126977 0.01592565 0.01178527 0.01175618 0.01171517 0.01169419 0.01178122 0.01174879] mean value: 0.012343311309814453 key: test_mcc value: [0.85441771 0.65915306 0.58066054 0.80907152 0.80907152 0.75714286 0.90649828 0.59335232 0.22941573 0.53881591] mean value: 0.6737599446817913 key: train_mcc value: [0.83208515 0.66194954 0.64802546 0.84785015 0.86084709 0.88095976 0.79006361 0.84482196 0.33933982 0.48038446] mean value: 0.7186327015117803 key: test_accuracy value: [0.92682927 0.80487805 0.7804878 0.90243902 0.90243902 0.87804878 0.95121951 0.7804878 0.55 0.725 ] mean value: 0.8201829268292683 key: train_accuracy value: [0.91553134 0.80653951 0.80381471 0.92370572 0.92915531 0.9400545 0.89100817 0.92098093 0.60326087 0.6875 ] mean value: 0.8421551060300912 key: test_fscore value: [0.92307692 0.75 0.8 0.9047619 0.9 0.87804878 0.95454545 0.81632653 0.18181818 0.62068966] mean value: 0.7729267430474928 key: train_fscore value: [0.91364903 0.76254181 0.83333333 0.92513369 0.93157895 0.94117647 0.89795918 0.92388451 0.34234234 0.54545455] mean value: 0.8017053858125319 key: test_precision value: [0.94736842 1. 0.72 0.86363636 0.94736842 0.9 0.91304348 0.71428571 1. 1. ] mean value: 0.900570239828821 key: train_precision value: [0.93714286 0.99130435 0.72580645 0.91052632 0.89847716 0.92146597 0.84210526 0.88888889 1. 1. ] mean value: 0.9115717250364899 key: test_recall value: [0.9 0.6 0.9 0.95 0.85714286 0.85714286 1. 0.95238095 0.1 0.45 ] mean value: 0.7566666666666666 key: train_recall value: [0.89130435 0.61956522 0.97826087 0.94021739 0.96721311 0.96174863 0.96174863 0.96174863 0.20652174 0.375 ] mean value: 0.7863328581610833 key: test_roc_auc value: [0.92619048 0.8 0.78333333 0.90357143 0.90357143 0.87857143 0.95 0.77619048 0.55 0.725 ] mean value: 0.8196428571428571 key: train_roc_auc value: [0.91559753 0.80705037 0.80333809 0.92366061 0.92925873 0.94011345 0.8912004 0.92109171 0.60326087 0.6875 ] mean value: 0.8422071751009741 key: test_jcc value: [0.85714286 0.6 0.66666667 0.82608696 0.81818182 0.7826087 0.91304348 0.68965517 0.1 0.45 ] mean value: 0.6703385644839918 key: train_jcc value: [0.84102564 0.61621622 0.71428571 0.86069652 0.87192118 0.88888889 0.81481481 0.85853659 0.20652174 0.375 ] mean value: 0.7047907299406508 MCC on Blind test: 0.62 Accuracy on Blind test: 0.83 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18305397 0.1674881 0.16102242 0.16084981 0.15811038 0.15780878 0.15861988 0.16229534 0.16273403 0.16005588] mean value: 0.16320385932922363 key: score_time value: [0.01662803 0.01653552 0.01613188 0.01542115 0.01611686 0.01628065 0.01615357 0.01622725 0.01625323 0.01510215] mean value: 0.016085028648376465 key: test_mcc value: [1. 1. 0.7565654 0.80817439 0.80907152 0.85441771 1. 0.7633652 0.65081403 0.85972695] mean value: 0.8502135194128032 key: train_mcc value: [0.99456506 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9994565056421516 key: test_accuracy value: [1. 1. 0.87804878 0.90243902 0.90243902 0.92682927 1. 0.87804878 0.825 0.925 ] mean value: 0.9237804878048781 key: train_accuracy value: [0.9972752 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9997275204359672 key: test_fscore value: [1. 1. 0.87179487 0.89473684 0.9 0.93023256 1. 0.88888889 0.82926829 0.93023256] mean value: 0.924515401175102 key: train_fscore value: [0.99728997 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9997289972899729 key: test_precision value: [1. 1. 0.89473684 0.94444444 0.94736842 0.90909091 1. 0.83333333 0.80952381 0.86956522] mean value: 0.9208062976941696 key: train_precision value: [0.99459459 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9994594594594595 key: test_recall value: [1. 1. 0.85 0.85 0.85714286 0.95238095 1. 0.95238095 0.85 1. ] mean value: 0.9311904761904761 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 1. 0.87738095 0.90119048 0.90357143 0.92619048 1. 0.87619048 0.825 0.925 ] mean value: 0.9234523809523809 key: train_roc_auc value: [0.99726776 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9997267759562841 key: test_jcc value: [1. 1. 0.77272727 0.80952381 0.81818182 0.86956522 1. 0.8 0.70833333 0.86956522] mean value: 0.8647896668548842 key: train_jcc value: [0.99459459 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9994594594594595 MCC on Blind test: 0.76 Accuracy on Blind test: 0.9 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06312132 0.05718136 0.0733006 0.07361269 0.07186341 0.06613135 0.07574439 0.06043625 0.05493975 0.07683802] mean value: 0.06731691360473632 key: score_time value: [0.0210464 0.02701759 0.02388501 0.03913212 0.02821994 0.0386188 0.03353763 0.02436566 0.03699136 0.03753781] mean value: 0.031035232543945312 key: test_mcc value: [1. 0.95238095 0.7098505 0.85441771 0.8547619 0.90649828 0.95227002 0.86240942 0.65081403 0.80403025] mean value: 0.854743305832982 key: train_mcc value: [0.99456522 0.97825894 0.97825894 0.98910074 0.97820147 0.98378331 0.99456506 0.98910074 0.98918887 0.99457991] mean value: 0.9869603176262142 key: test_accuracy value: [1. 0.97560976 0.85365854 0.92682927 0.92682927 0.95121951 0.97560976 0.92682927 0.825 0.9 ] mean value: 0.9261585365853658 key: train_accuracy value: [0.9972752 0.98910082 0.98910082 0.99455041 0.98910082 0.99182561 0.9972752 0.99455041 0.99456522 0.99728261] mean value: 0.993462711764009 key: test_fscore value: [1. 0.97560976 0.84210526 0.92307692 0.92682927 0.95454545 0.97674419 0.93333333 0.82926829 0.9047619 ] mean value: 0.9266274381995193 key: train_fscore value: [0.9972752 0.98918919 0.98918919 0.99456522 0.98907104 0.99186992 0.99726027 0.99453552 0.99453552 0.9972752 ] mean value: 0.9934766273663551 key: test_precision value: [1. 0.95238095 0.88888889 0.94736842 0.95 0.91304348 0.95454545 0.875 0.80952381 0.86363636] mean value: 0.915438736828897 key: train_precision value: [1. 0.98387097 0.98387097 0.99456522 0.98907104 0.98387097 1. 0.99453552 1. 1. ] mean value: 0.992978467799416 key: test_recall value: [1. 1. 0.8 0.9 0.9047619 1. 1. 1. 0.85 0.95 ] mean value: 0.9404761904761905 key: train_recall value: [0.99456522 0.99456522 0.99456522 0.99456522 0.98907104 1. 0.99453552 0.99453552 0.98913043 0.99456522] mean value: 0.9940098598241862 key: test_roc_auc value: [1. 0.97619048 0.85238095 0.92619048 0.92738095 0.95 0.975 0.925 0.825 0.9 ] mean value: 0.9257142857142857 key: train_roc_auc value: [0.99728261 0.98908589 0.98908589 0.99455037 0.98910074 0.99184783 0.99726776 0.99455037 0.99456522 0.99728261] mean value: 0.9934619268234735 key: test_jcc value: [1. 0.95238095 0.72727273 0.85714286 0.86363636 0.91304348 0.95454545 0.875 0.70833333 0.82608696] mean value: 0.8677442123094297 key: train_jcc value: [0.99456522 0.97860963 0.97860963 0.98918919 0.97837838 0.98387097 0.99453552 0.98913043 0.98913043 0.99456522] mean value: 0.987058461011991 MCC on Blind test: 0.75 Accuracy on Blind test: 0.89 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.09550643 0.09747839 0.07713652 0.11133432 0.1167531 0.05979848 0.10996175 0.11596441 0.05582666 0.06558204] mean value: 0.09053421020507812 key: score_time value: [0.02205396 0.0137794 0.02239418 0.01748085 0.01414633 0.01440907 0.02215457 0.02253461 0.01375127 0.02219558] mean value: 0.018489980697631837 key: test_mcc value: [0.75714286 0.71121921 0.56190476 0.41428571 0.12142857 0.41766229 0.51551459 0.47439956 0. 0.65081403] mean value: 0.46243715854258727 key: train_mcc value: [1. 0.99456506 0.99456506 0.99456506 1. 0.99456522 0.99456522 1. 0.99457991 0.99457991] mean value: 0.9961985415816125 key: test_accuracy value: [0.87804878 0.85365854 0.7804878 0.70731707 0.56097561 0.70731707 0.75609756 0.73170732 0.5 0.825 ] mean value: 0.7300609756097561 key: train_accuracy value: [1. 0.9972752 0.9972752 0.9972752 1. 0.9972752 0.9972752 1. 0.99728261 0.99728261] mean value: 0.9980941239189669 key: test_fscore value: [0.87804878 0.85714286 0.7804878 0.7 0.57142857 0.7 0.75 0.76595745 0.47368421 0.82926829] mean value: 0.7306017963955036 key: train_fscore value: [1. 0.99728997 0.99728997 0.99728997 1. 0.9972752 0.9972752 1. 0.99728997 0.99728997] mean value: 0.9981000273217991 key: test_precision value: [0.85714286 0.81818182 0.76190476 0.7 0.57142857 0.73684211 0.78947368 0.69230769 0.5 0.80952381] mean value: 0.7236805299963195 key: train_precision value: [1. 0.99459459 0.99459459 0.99459459 1. 0.99456522 0.99456522 1. 0.99459459 0.99459459] mean value: 0.9962103407755581 key: test_recall value: [0.9 0.9 0.8 0.7 0.57142857 0.66666667 0.71428571 0.85714286 0.45 0.85 ] mean value: 0.7409523809523809 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87857143 0.8547619 0.78095238 0.70714286 0.56071429 0.70833333 0.75714286 0.72857143 0.5 0.825 ] mean value: 0.7301190476190477 key: train_roc_auc value: [1. 0.99726776 0.99726776 0.99726776 1. 0.99728261 0.99728261 1. 0.99728261 0.99728261] mean value: 0.9980933713471133 key: test_jcc value: [0.7826087 0.75 0.64 0.53846154 0.4 0.53846154 0.6 0.62068966 0.31034483 0.70833333] mean value: 0.5888899588667205 key: train_jcc value: [1. 0.99459459 0.99459459 0.99459459 1. 0.99456522 0.99456522 1. 0.99459459 0.99459459] mean value: 0.9962103407755581 MCC on Blind test: 0.5 Accuracy on Blind test: 0.78 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.61533713 0.6027925 0.6025629 0.59919167 0.60118747 0.60153985 0.60033226 0.61004019 0.60367298 0.60337877] mean value: 0.6040035724639893 key: score_time value: [0.00962615 0.01016498 0.00934196 0.00935721 0.0094378 0.00935674 0.00931764 0.00950098 0.00953007 0.00993609] mean value: 0.009556961059570313 key: test_mcc value: [1. 0.95238095 0.7098505 0.90238095 0.8547619 0.86240942 0.90649828 0.86240942 0.70352647 0.75858261] mean value: 0.8512800501192865 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97560976 0.85365854 0.95121951 0.92682927 0.92682927 0.95121951 0.92682927 0.85 0.875 ] mean value: 0.9237195121951219 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.97560976 0.84210526 0.95 0.92682927 0.93333333 0.95454545 0.93333333 0.85714286 0.88372093] mean value: 0.9256620196135675 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95238095 0.88888889 0.95 0.95 0.875 0.91304348 0.875 0.81818182 0.82608696] mean value: 0.9048582094234268 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.8 0.95 0.9047619 1. 1. 1. 0.9 0.95 ] mean value: 0.9504761904761905 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.97619048 0.85238095 0.95119048 0.92738095 0.925 0.95 0.925 0.85 0.875 ] mean value: 0.9232142857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.95238095 0.72727273 0.9047619 0.86363636 0.875 0.91304348 0.875 0.75 0.79166667] mean value: 0.8652762092979485 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.73 Accuracy on Blind test: 0.89 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02614141 0.02861094 0.02827191 0.02825522 0.02861452 0.0286746 0.02852798 0.02840257 0.04027748 0.03529644] mean value: 0.03010730743408203 key: score_time value: [0.01253939 0.01268291 0.01283693 0.01493955 0.01348925 0.01353598 0.01356745 0.01352119 0.0201292 0.01334429] mean value: 0.014058613777160644 key: test_mcc value: [0.30603535 0.41428571 0.36718832 0.46300848 0.4373371 0.56836003 0.46300848 0.56190476 0.30151134 0.40201513] mean value: 0.42846547057135465 key: train_mcc value: [0.85739567 0.94039882 0.78808937 0.98378331 0.94600476 0.92132206 0.93083981 0.94639726 0.94615534 0.96208441] mean value: 0.922247080457379 key: test_accuracy value: [0.63414634 0.70731707 0.68292683 0.73170732 0.70731707 0.7804878 0.73170732 0.7804878 0.65 0.7 ] mean value: 0.7106097560975609 key: train_accuracy value: [0.92370572 0.97002725 0.88555858 0.99182561 0.97275204 0.95912807 0.96457766 0.97275204 0.97282609 0.98097826] mean value: 0.9594131323302926 key: test_fscore value: [0.69387755 0.7 0.64864865 0.71794872 0.66666667 0.76923077 0.74418605 0.7804878 0.66666667 0.68421053] mean value: 0.7071923397887343 key: train_fscore value: [0.92929293 0.97050938 0.89655172 0.99178082 0.97222222 0.95726496 0.96551724 0.97206704 0.97237569 0.98082192] mean value: 0.9608403927115273 key: test_precision value: [0.5862069 0.7 0.70588235 0.73684211 0.8 0.83333333 0.72727273 0.8 0.63636364 0.72222222] mean value: 0.7248123273947977 key: train_precision value: [0.86792453 0.95767196 0.81981982 1. 0.98870056 1. 0.93814433 0.99428571 0.98876404 0.98895028] mean value: 0.9544261236134951 key: test_recall value: [0.85 0.7 0.6 0.7 0.57142857 0.71428571 0.76190476 0.76190476 0.7 0.65 ] mean value: 0.7009523809523809 key: train_recall value: [1. 0.98369565 0.98913043 0.98369565 0.95628415 0.91803279 0.99453552 0.95081967 0.95652174 0.97282609] mean value: 0.9705541696364932 key: test_roc_auc value: [0.63928571 0.70714286 0.68095238 0.73095238 0.71071429 0.78214286 0.73095238 0.78095238 0.65 0.7 ] mean value: 0.7113095238095238 key: train_roc_auc value: [0.92349727 0.9699899 0.8852756 0.99184783 0.97270729 0.95901639 0.96465906 0.97269244 0.97282609 0.98097826] mean value: 0.9593490140175813 key: test_jcc value: [0.53125 0.53846154 0.48 0.56 0.5 0.625 0.59259259 0.64 0.5 0.52 ] mean value: 0.5487304131054132 key: train_jcc value: [0.86792453 0.94270833 0.8125 0.98369565 0.94594595 0.91803279 0.93333333 0.94565217 0.94623656 0.96236559] mean value: 0.9258394904424336 MCC on Blind test: 0.3 Accuracy on Blind test: 0.62 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02562833 0.05641294 0.03336596 0.03528643 0.03723907 0.03774786 0.03155208 0.03795075 0.03786731 0.03780031] mean value: 0.037085103988647464 key: score_time value: [0.01957393 0.03293657 0.0218451 0.01995754 0.01814795 0.01629925 0.02359295 0.02411532 0.02410388 0.02387118] mean value: 0.02244436740875244 key: test_mcc value: [0.85441771 0.80907152 0.7098505 0.70714286 0.76500781 0.70714286 0.8047619 0.7633652 0.35400522 0.90453403] mean value: 0.7379299602053512 key: train_mcc value: [0.83670017 0.84197084 0.82561178 0.83107125 0.84762076 0.83656559 0.83118002 0.85318761 0.88588265 0.82095534] mean value: 0.8410746004220283 key: test_accuracy value: [0.92682927 0.90243902 0.85365854 0.85365854 0.87804878 0.85365854 0.90243902 0.87804878 0.675 0.95 ] mean value: 0.8673780487804879 key: train_accuracy value: [0.91825613 0.92098093 0.91280654 0.91553134 0.92370572 0.91825613 0.91553134 0.92643052 0.94293478 0.91032609] mean value: 0.9204759507167397 key: test_fscore value: [0.92307692 0.9047619 0.84210526 0.85 0.87179487 0.85714286 0.9047619 0.88888889 0.69767442 0.95238095] mean value: 0.8692587984570849 key: train_fscore value: [0.91935484 0.92140921 0.91304348 0.91598916 0.92432432 0.91847826 0.91598916 0.92722372 0.94277929 0.91152815] mean value: 0.9210119597403508 key: test_precision value: [0.94736842 0.86363636 0.88888889 0.85 0.94444444 0.85714286 0.9047619 0.83333333 0.65217391 0.90909091] mean value: 0.8650841035394811 key: train_precision value: [0.90957447 0.91891892 0.91304348 0.91351351 0.9144385 0.91351351 0.90860215 0.91489362 0.94535519 0.8994709 ] mean value: 0.915132425325236 key: test_recall value: [0.9 0.95 0.8 0.85 0.80952381 0.85714286 0.9047619 0.95238095 0.75 1. ] mean value: 0.8773809523809524 key: train_recall value: [0.92934783 0.92391304 0.91304348 0.91847826 0.93442623 0.92349727 0.92349727 0.93989071 0.94021739 0.92391304] mean value: 0.9270224518888097 key: test_roc_auc value: [0.92619048 0.90357143 0.85238095 0.85357143 0.8797619 0.85357143 0.90238095 0.87619048 0.675 0.95 ] mean value: 0.8672619047619048 key: train_roc_auc value: [0.91822583 0.92097292 0.91280589 0.91552328 0.92373485 0.91827037 0.91555298 0.92646709 0.94293478 0.91032609] mean value: 0.920481408885721 key: test_jcc value: [0.85714286 0.82608696 0.72727273 0.73913043 0.77272727 0.75 0.82608696 0.8 0.53571429 0.90909091] mean value: 0.7743252399774139 key: train_jcc value: [0.85074627 0.85427136 0.84 0.845 0.85929648 0.84924623 0.845 0.86432161 0.89175258 0.83743842] mean value: 0.8537072948013584 MCC on Blind test: 0.63 Accuracy on Blind test: 0.83 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.2605021 0.26175451 0.26747704 0.42438817 0.33081579 0.27082133 0.26918602 0.26025081 0.24329901 0.27351165] mean value: 0.2862006425857544 key: score_time value: [0.02257133 0.02124763 0.0225091 0.03519368 0.02244186 0.02156854 0.01980209 0.02273059 0.01768684 0.02059722] mean value: 0.0226348876953125 key: test_mcc value: [0.85441771 0.80907152 0.7098505 0.70714286 0.76500781 0.65871309 0.75714286 0.7633652 0.25286087 0.90453403] mean value: 0.7182106443016628 key: train_mcc value: [0.83670017 0.84197084 0.78774111 0.83107125 0.84762076 0.76096804 0.76064422 0.85318761 0.80477581 0.82095534] mean value: 0.8145635147399292 key: test_accuracy value: [0.92682927 0.90243902 0.85365854 0.85365854 0.87804878 0.82926829 0.87804878 0.87804878 0.625 0.95 ] mean value: 0.8575 key: train_accuracy value: [0.91825613 0.92098093 0.89373297 0.91553134 0.92370572 0.88010899 0.88010899 0.92643052 0.90217391 0.91032609] mean value: 0.9071355585831062 key: test_fscore value: [0.92307692 0.9047619 0.84210526 0.85 0.87179487 0.8372093 0.87804878 0.88888889 0.65116279 0.95238095] mean value: 0.8599429677572497 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:176: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:179: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.91935484 0.92140921 0.89544236 0.91598916 0.92432432 0.88235294 0.88172043 0.92722372 0.90374332 0.91152815] mean value: 0.9083088452869689 key: test_precision value: [0.94736842 0.86363636 0.88888889 0.85 0.94444444 0.81818182 0.9 0.83333333 0.60869565 0.90909091] mean value: 0.8563639830802302 key: train_precision value: [0.90957447 0.91891892 0.88359788 0.91351351 0.9144385 0.86387435 0.86772487 0.91489362 0.88947368 0.8994709 ] mean value: 0.8975480700766527 key: test_recall value: [0.9 0.95 0.8 0.85 0.80952381 0.85714286 0.85714286 0.95238095 0.7 1. ] mean value: 0.8676190476190476 key: train_recall value: [0.92934783 0.92391304 0.9076087 0.91847826 0.93442623 0.90163934 0.89617486 0.93989071 0.91847826 0.92391304] mean value: 0.9193870277975766 key: test_roc_auc value: [0.92619048 0.90357143 0.85238095 0.85357143 0.8797619 0.82857143 0.87857143 0.87619048 0.625 0.95 ] mean value: 0.8573809523809524 key: train_roc_auc value: [0.91822583 0.92097292 0.89369506 0.91552328 0.92373485 0.8801675 0.88015265 0.92646709 0.90217391 0.91032609] mean value: 0.9071439177952008 key: test_jcc value: [0.85714286 0.82608696 0.72727273 0.73913043 0.77272727 0.72 0.7826087 0.8 0.48275862 0.90909091] mean value: 0.7616818473879943 key: train_jcc value: [0.85074627 0.85427136 0.81067961 0.845 0.85929648 0.78947368 0.78846154 0.86432161 0.82439024 0.83743842] mean value: 0.8324079217763206 MCC on Blind test: 0.63 Accuracy on Blind test: 0.83 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04052091 0.04298687 0.05086136 0.05352783 0.04348326 0.04401159 0.04306316 0.05227876 0.05232954 0.053303 ] mean value: 0.04763662815093994 key: score_time value: [0.01334476 0.01325011 0.0122745 0.01240516 0.02166629 0.01318216 0.01324201 0.01337886 0.01333618 0.0133636 ] mean value: 0.013944363594055176 key: test_mcc value: [0.63990931 0.71955846 0.69383117 0.76905945 0.8049036 0.74951538 0.82027988 0.7306455 0.76989735 0.6634888 ] mean value: 0.7361088913855512 key: train_mcc value: [0.78114195 0.78136227 0.78586682 0.79445673 0.76450976 0.78590099 0.78943823 0.77028847 0.78900234 0.77715074] mean value: 0.7819118302880803 key: test_accuracy value: [0.81981982 0.85585586 0.84684685 0.88288288 0.9009009 0.87387387 0.90990991 0.86486486 0.88181818 0.82727273] mean value: 0.8664045864045864 key: train_accuracy value: [0.88966901 0.88966901 0.89167503 0.89568706 0.88164493 0.89167503 0.89368104 0.88465396 0.89378758 0.88777555] mean value: 0.8899918191448092 key: test_fscore value: [0.81481481 0.86440678 0.84684685 0.88695652 0.90598291 0.87931034 0.9122807 0.86956522 0.88888889 0.84033613] mean value: 0.8709389156360662 key: train_fscore value: [0.89341085 0.89361702 0.89595376 0.90019194 0.88476562 0.8957529 0.89728682 0.88736533 0.89688716 0.89126214] mean value: 0.8936493535818284 key: test_precision value: [0.83018868 0.80952381 0.83928571 0.85 0.86885246 0.85 0.89655172 0.84745763 0.83870968 0.78125 ] mean value: 0.841181969074713 key: train_precision value: [0.86491557 0.8635514 0.86270872 0.86372007 0.86121673 0.86245353 0.8670412 0.86615679 0.87145558 0.86440678] mean value: 0.8647626371740085 key: test_recall value: [0.8 0.92727273 0.85454545 0.92727273 0.94642857 0.91071429 0.92857143 0.89285714 0.94545455 0.90909091] mean value: 0.9042207792207793 key: train_recall value: [0.9238477 0.9258517 0.93186373 0.93987976 0.90963855 0.93172691 0.92971888 0.90963855 0.9238477 0.91983968] mean value: 0.9245853152087308 key: test_roc_auc value: [0.81964286 0.85649351 0.84691558 0.88327922 0.90048701 0.87353896 0.90974026 0.86461039 0.88181818 0.82727273] mean value: 0.8663798701298701 key: train_roc_auc value: [0.88963469 0.88963268 0.89163467 0.89564269 0.88167298 0.89171516 0.89371715 0.884679 0.89378758 0.88777555] mean value: 0.8899892153785482 key: test_jcc value: [0.6875 0.76119403 0.734375 0.796875 0.828125 0.78461538 0.83870968 0.76923077 0.8 0.72463768] mean value: 0.7725262542275675 key: train_jcc value: [0.80735552 0.80769231 0.81151832 0.81849913 0.79334501 0.81118881 0.81370826 0.79753521 0.81305115 0.80385289] mean value: 0.8077746603706929 MCC on Blind test: 0.64 Accuracy on Blind test: 0.84 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.037637 1.08531857 1.08370805 1.19189143 1.05426836 1.07793021 0.96841478 1.15566444 1.01701474 1.18346882] mean value: 1.0855316400527955 key: score_time value: [0.01516891 0.01343751 0.01641273 0.01554537 0.01356006 0.0157094 0.01250768 0.01551104 0.01563716 0.01549435] mean value: 0.014898419380187988 key: test_mcc value: [0.69373177 0.78859019 0.78376623 0.76698119 0.83897362 0.78420577 0.82447186 0.74951538 0.7823356 0.5938157 ] mean value: 0.7606387310061538 key: train_mcc value: [0.87069787 0.85462868 0.85611966 0.84294292 0.8064405 0.84259254 0.86623116 0.7977593 0.84675474 0.84990593] mean value: 0.8434073301854806 key: test_accuracy value: [0.84684685 0.89189189 0.89189189 0.88288288 0.91891892 0.89189189 0.90990991 0.87387387 0.89090909 0.79090909] mean value: 0.878992628992629 key: train_accuracy value: [0.93480441 0.92678034 0.92778335 0.92076229 0.90270812 0.92076229 0.9327984 0.89869609 0.92284569 0.9248497 ] mean value: 0.9212790676639135 key: test_fscore value: [0.8440367 0.89655172 0.89090909 0.88495575 0.92173913 0.89473684 0.91525424 0.87931034 0.89285714 0.80991736] mean value: 0.8830268317391928 key: train_fscore value: [0.93646139 0.92864125 0.92913386 0.92307692 0.9049951 0.92262488 0.93399015 0.90009891 0.92473118 0.92566898] mean value: 0.92294226227868 key: test_precision value: [0.85185185 0.85245902 0.89090909 0.86206897 0.89830508 0.87931034 0.87096774 0.85 0.87719298 0.74242424] mean value: 0.8575489321060842 key: train_precision value: [0.91412214 0.90648855 0.91295938 0.89772727 0.8833652 0.90057361 0.91682785 0.88693957 0.90267176 0.91568627] mean value: 0.9037361609709368 key: test_recall value: [0.83636364 0.94545455 0.89090909 0.90909091 0.94642857 0.91071429 0.96428571 0.91071429 0.90909091 0.89090909] mean value: 0.9113961038961038 key: train_recall value: [0.95991984 0.95190381 0.94589178 0.9498998 0.92771084 0.94578313 0.95180723 0.91365462 0.94789579 0.93587174] mean value: 0.9430338588824234 key: test_roc_auc value: [0.84675325 0.89237013 0.89188312 0.88311688 0.91866883 0.89172078 0.90941558 0.87353896 0.89090909 0.79090909] mean value: 0.8789285714285714 key: train_roc_auc value: [0.9347792 0.92675512 0.92776517 0.92073303 0.90273318 0.92078736 0.93281744 0.89871108 0.92284569 0.9248497 ] mean value: 0.9212776959541573 key: test_jcc value: [0.73015873 0.8125 0.80327869 0.79365079 0.85483871 0.80952381 0.84375 0.78461538 0.80645161 0.68055556] mean value: 0.7919323284609509 key: train_jcc value: [0.88051471 0.86678832 0.86764706 0.85714286 0.82647585 0.85636364 0.87615527 0.81834532 0.86 0.86162362] mean value: 0.8571056637111273 MCC on Blind test: 0.66 Accuracy on Blind test: 0.86 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01727653 0.01254821 0.01245856 0.01221752 0.01246524 0.01370215 0.01368165 0.01427627 0.01377892 0.01384521] mean value: 0.013625025749206543 key: score_time value: [0.01262164 0.00979233 0.00930595 0.00934267 0.01013041 0.01029086 0.01028466 0.01063132 0.01032662 0.01033711] mean value: 0.010306358337402344 key: test_mcc value: [0.5928164 0.49641957 0.50003497 0.42337662 0.75189742 0.53770284 0.58571429 0.35400098 0.58191437 0.47343208] mean value: 0.5297309536091728 key: train_mcc value: [0.53906794 0.53171804 0.55439159 0.57014286 0.55399337 0.55772149 0.53001234 0.54540938 0.56790568 0.56939559] mean value: 0.5519758288087392 key: test_accuracy value: [0.79279279 0.74774775 0.74774775 0.71171171 0.87387387 0.76576577 0.79279279 0.67567568 0.79090909 0.73636364] mean value: 0.7635380835380835 key: train_accuracy value: [0.76930792 0.76529589 0.77632899 0.78435306 0.77632899 0.77833501 0.76429288 0.77231695 0.78356713 0.78456914] mean value: 0.7754695951582201 key: test_fscore value: [0.77227723 0.73584906 0.7254902 0.70909091 0.88135593 0.75 0.79279279 0.66037736 0.78899083 0.74336283] mean value: 0.7559587130529115 key: train_fscore value: [0.76482618 0.75776398 0.76746611 0.77673936 0.76795005 0.77098446 0.75495308 0.76573787 0.77777778 0.78128179] mean value: 0.7685480644155679 key: test_precision value: [0.84782609 0.76470588 0.78723404 0.70909091 0.83870968 0.8125 0.8 0.7 0.7962963 0.72413793] mean value: 0.7780500825703698 key: train_precision value: [0.78079332 0.78372591 0.8 0.80603448 0.79697624 0.79657388 0.78524946 0.78768577 0.79915433 0.79338843] mean value: 0.7929581826379648 key: test_recall value: [0.70909091 0.70909091 0.67272727 0.70909091 0.92857143 0.69642857 0.78571429 0.625 0.78181818 0.76363636] mean value: 0.7381168831168832 key: train_recall value: [0.749499 0.73346693 0.73747495 0.749499 0.74096386 0.74698795 0.72690763 0.74497992 0.75751503 0.76953908] mean value: 0.7456833345405671 key: test_roc_auc value: [0.79204545 0.7474026 0.74707792 0.71168831 0.87337662 0.7663961 0.79285714 0.67613636 0.79090909 0.73636364] mean value: 0.7634253246753246 key: train_roc_auc value: [0.76932781 0.76532784 0.776368 0.78438805 0.77629355 0.7783036 0.76425542 0.77228956 0.78356713 0.78456914] mean value: 0.7754690103097762 key: test_jcc value: [0.62903226 0.58208955 0.56923077 0.54929577 0.78787879 0.6 0.65671642 0.49295775 0.65151515 0.5915493 ] mean value: 0.6110265753739887 key: train_jcc value: [0.6192053 0.61 0.62267343 0.63497453 0.62331081 0.62731872 0.60636516 0.62040134 0.63636364 0.64106845] mean value: 0.6241681375865916 MCC on Blind test: 0.44 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0152173 0.01704001 0.01721168 0.02113652 0.01699519 0.01698351 0.01694894 0.01697063 0.0170331 0.01704311] mean value: 0.017258000373840333 key: score_time value: [0.0123899 0.01239848 0.01244283 0.0125103 0.01237535 0.01237679 0.01236582 0.01238203 0.01243782 0.01241136] mean value: 0.012409067153930664 key: test_mcc value: [0.60674852 0.53168696 0.51517746 0.56873266 0.71205754 0.51517746 0.67598342 0.58620801 0.73323558 0.56400939] mean value: 0.6009016993199549 key: train_mcc value: [0.63491093 0.63543061 0.62142983 0.61525929 0.6391103 0.66099253 0.62317718 0.65531326 0.65761181 0.63728607] mean value: 0.6380521801166391 key: test_accuracy value: [0.8018018 0.76576577 0.75675676 0.78378378 0.85585586 0.75675676 0.83783784 0.79279279 0.86363636 0.78181818] mean value: 0.7996805896805896 key: train_accuracy value: [0.81745236 0.81745236 0.81043129 0.80742227 0.81945838 0.83049147 0.8114343 0.82748245 0.82865731 0.81863727] mean value: 0.8188919463802228 key: test_fscore value: [0.78846154 0.75925926 0.74285714 0.77358491 0.85964912 0.76923077 0.84210526 0.8 0.87179487 0.78571429] mean value: 0.7992657158943157 key: train_fscore value: [0.81726908 0.81390593 0.80655067 0.80408163 0.82142857 0.82980866 0.80816327 0.83003953 0.83119447 0.81809045] mean value: 0.818053225190839 key: test_precision value: [0.83673469 0.77358491 0.78 0.80392157 0.84482759 0.73770492 0.82758621 0.77966102 0.82258065 0.77192982] mean value: 0.7978531365973461 key: train_precision value: [0.81891348 0.8308977 0.82426778 0.81912682 0.81176471 0.83232323 0.82157676 0.81712062 0.81906615 0.82056452] mean value: 0.821562177423608 key: test_recall value: [0.74545455 0.74545455 0.70909091 0.74545455 0.875 0.80357143 0.85714286 0.82142857 0.92727273 0.8 ] mean value: 0.802987012987013 key: train_recall value: [0.81563126 0.79759519 0.78957916 0.78957916 0.8313253 0.82730924 0.79518072 0.84337349 0.84368737 0.81563126] mean value: 0.8148892161833707 key: test_roc_auc value: [0.8012987 0.76558442 0.75633117 0.78344156 0.85568182 0.75633117 0.83766234 0.79253247 0.86363636 0.78181818] mean value: 0.7994318181818182 key: train_roc_auc value: [0.81745419 0.81747229 0.81045223 0.80744018 0.81947027 0.83048829 0.81141802 0.82749837 0.82865731 0.81863727] mean value: 0.8188988418604277 key: test_jcc value: [0.65079365 0.6119403 0.59090909 0.63076923 0.75384615 0.625 0.72727273 0.66666667 0.77272727 0.64705882] mean value: 0.6676983915021667 key: train_jcc value: [0.6910017 0.6862069 0.67581475 0.67235495 0.6969697 0.7091222 0.67808219 0.70945946 0.71114865 0.69217687] mean value: 0.6922337365141537 MCC on Blind test: 0.5 Accuracy on Blind test: 0.79 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01553726 0.0123558 0.01168871 0.0119946 0.01135278 0.01145506 0.01270914 0.01214385 0.01241255 0.01243854] mean value: 0.012408828735351563 key: score_time value: [0.03279448 0.01701331 0.01946092 0.01539755 0.01587725 0.01698089 0.01995373 0.01609039 0.01584721 0.01604939] mean value: 0.018546509742736816 key: test_mcc value: [0.65081289 0.49545455 0.48459368 0.53318254 0.62382476 0.61299389 0.72309474 0.49561285 0.5731902 0.56400939] mean value: 0.5756769480530834 key: train_mcc value: [0.73735103 0.69400592 0.73861246 0.7497858 0.73622878 0.74649978 0.73511426 0.72317842 0.72360009 0.74076811] mean value: 0.7325144649619348 key: test_accuracy value: [0.81981982 0.74774775 0.73873874 0.76576577 0.81081081 0.8018018 0.85585586 0.74774775 0.78181818 0.78181818] mean value: 0.7851924651924652 key: train_accuracy value: [0.8665998 0.84553661 0.86760281 0.87261785 0.8665998 0.87061184 0.86459378 0.86058175 0.85871743 0.86873747] mean value: 0.8642199142517734 key: test_fscore value: [0.83333333 0.74545455 0.75630252 0.77192982 0.82051282 0.81967213 0.86885246 0.75438596 0.8 0.78571429] mean value: 0.7956157885661007 key: train_fscore value: [0.87345385 0.85249042 0.87380497 0.87939221 0.87223823 0.87772512 0.87252125 0.86544046 0.86735654 0.87464115] mean value: 0.8709064207475893 key: test_precision value: [0.76923077 0.74545455 0.703125 0.74576271 0.78688525 0.75757576 0.8030303 0.74137931 0.73846154 0.77192982] mean value: 0.7562835006425191 key: train_precision value: [0.83152174 0.81651376 0.83546618 0.83574007 0.83609576 0.83123878 0.82352941 0.83551402 0.81737589 0.83699634] mean value: 0.8299991949383702 key: test_recall value: [0.90909091 0.74545455 0.81818182 0.8 0.85714286 0.89285714 0.94642857 0.76785714 0.87272727 0.8 ] mean value: 0.8409740259740259 key: train_recall value: [0.91983968 0.89178357 0.91583166 0.92785571 0.91164659 0.92971888 0.92771084 0.89759036 0.9238477 0.91583166] mean value: 0.9161656646626587 key: test_roc_auc value: [0.82061688 0.74772727 0.73944805 0.76607143 0.81038961 0.80097403 0.85503247 0.74756494 0.78181818 0.78181818] mean value: 0.785146103896104 key: train_roc_auc value: [0.86654635 0.84549018 0.86755439 0.87256239 0.86664494 0.87067106 0.86465702 0.86061883 0.85871743 0.86873747] mean value: 0.8642200062776154 key: test_jcc value: [0.71428571 0.5942029 0.60810811 0.62857143 0.69565217 0.69444444 0.76811594 0.6056338 0.66666667 0.64705882] mean value: 0.6622740002915428 key: train_jcc value: [0.77533784 0.74290484 0.77589134 0.78474576 0.77342419 0.78209459 0.77386935 0.76279863 0.76578073 0.77721088] mean value: 0.7714058165400389 MCC on Blind test: 0.43 Accuracy on Blind test: 0.74 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.05147576 0.05113029 0.05149794 0.05157065 0.05220318 0.05111527 0.05288601 0.05138397 0.05986619 0.06046939] mean value: 0.05335986614227295 key: score_time value: [0.01986051 0.01984072 0.01992202 0.01969242 0.01982999 0.01954079 0.01995969 0.0201807 0.0201447 0.01967216] mean value: 0.019864368438720702 key: test_mcc value: [0.71224427 0.66693232 0.65875884 0.69483296 0.80802876 0.64590588 0.7306455 0.67932297 0.77407027 0.67363307] mean value: 0.7044374854667038 key: train_mcc value: [0.74644043 0.7314568 0.75527303 0.74901457 0.73276182 0.76010338 0.74112632 0.75105359 0.74135713 0.76031401] mean value: 0.7468901068438255 key: test_accuracy value: [0.85585586 0.82882883 0.82882883 0.84684685 0.9009009 0.81981982 0.86486486 0.83783784 0.88181818 0.82727273] mean value: 0.8492874692874692 key: train_accuracy value: [0.87061184 0.86359077 0.87462387 0.87161484 0.86459378 0.8776329 0.86860582 0.87362086 0.86873747 0.87775551] mean value: 0.87113876700241 key: test_fscore value: [0.85714286 0.84033613 0.83185841 0.84955752 0.90756303 0.83333333 0.86956522 0.84745763 0.8907563 0.84552846] mean value: 0.8573098881659105 key: train_fscore value: [0.87795648 0.87072243 0.88218662 0.87924528 0.8708134 0.88403042 0.87488061 0.87954111 0.87511916 0.88425047] mean value: 0.877874598461022 key: test_precision value: [0.84210526 0.78125 0.81034483 0.82758621 0.85714286 0.78125 0.84745763 0.80645161 0.828125 0.76470588] mean value: 0.8146419277158321 key: train_precision value: [0.83154122 0.82820976 0.83274021 0.83065954 0.83180987 0.83935018 0.83424408 0.83941606 0.83454545 0.83963964] mean value: 0.8342156018881279 key: test_recall value: [0.87272727 0.90909091 0.85454545 0.87272727 0.96428571 0.89285714 0.89285714 0.89285714 0.96363636 0.94545455] mean value: 0.9061038961038961 key: train_recall value: [0.92985972 0.91783567 0.93787575 0.93386774 0.91365462 0.93373494 0.91967871 0.92369478 0.91983968 0.93386774] mean value: 0.9263909344794006 key: test_roc_auc value: [0.85600649 0.82954545 0.82905844 0.84707792 0.90032468 0.81915584 0.86461039 0.83733766 0.88181818 0.82727273] mean value: 0.8492207792207792 key: train_roc_auc value: [0.87055235 0.86353631 0.87456037 0.87155234 0.86464294 0.87768911 0.86865699 0.87367104 0.86873747 0.87775551] mean value: 0.8711354435779189 key: test_jcc value: [0.75 0.72463768 0.71212121 0.73846154 0.83076923 0.71428571 0.76923077 0.73529412 0.8030303 0.73239437] mean value: 0.7510224932902431 key: train_jcc value: [0.78246206 0.77104377 0.78920742 0.78451178 0.77118644 0.79216354 0.77758913 0.78498294 0.7779661 0.79251701] mean value: 0.7823630194686007 MCC on Blind test: 0.63 Accuracy on Blind test: 0.83 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [3.41872621 3.53579807 3.53889394 3.50232983 3.69363928 3.61123085 1.48464918 3.60396457 3.29749489 3.37618065] mean value: 3.3062907457351685 key: score_time value: [0.01836133 0.01513553 0.01684785 0.01543593 0.01529264 0.0153451 0.01301169 0.02348351 0.01276898 0.02390289] mean value: 0.01695854663848877 key: test_mcc value: [0.91368563 0.89249761 0.85644694 0.76905945 0.88077101 0.87733514 0.82824452 0.83897362 0.89625816 0.74848119] mean value: 0.8501753280181599 key: train_mcc value: [0.99799599 1. 0.99398393 0.99198394 0.99599599 0.99599599 0.87092746 0.99599599 0.99400594 0.997998 ] mean value: 0.983488322406025 key: test_accuracy value: [0.95495495 0.94594595 0.92792793 0.88288288 0.93693694 0.93693694 0.90990991 0.91891892 0.94545455 0.86363636] mean value: 0.9223505323505323 key: train_accuracy value: [0.99899699 1. 0.99699097 0.99598796 0.99799398 0.99799398 0.9327984 0.99799398 0.99699399 0.998998 ] mean value: 0.9914748252774355 key: test_fscore value: [0.95652174 0.94642857 0.92857143 0.88695652 0.94117647 0.94017094 0.91666667 0.92173913 0.94827586 0.87804878] mean value: 0.926455611128696 key: train_fscore value: [0.99899699 1. 0.996997 0.99598394 0.99799599 0.99799599 0.93625119 0.99799599 0.99698492 0.998999 ] mean value: 0.9918201012630389 key: test_precision value: [0.91666667 0.92982456 0.9122807 0.85 0.88888889 0.90163934 0.859375 0.89830508 0.90163934 0.79411765] mean value: 0.8852737239042626 key: train_precision value: [1. 1. 0.996 0.99798793 0.996 0.996 0.88969259 0.996 1. 0.998 ] mean value: 0.986968051346051 key: test_recall value: [1. 0.96363636 0.94545455 0.92727273 1. 0.98214286 0.98214286 0.94642857 1. 0.98181818] mean value: 0.9728896103896104 key: train_recall value: [0.99799599 1. 0.99799599 0.99398798 1. 1. 0.98795181 1. 0.99398798 1. ] mean value: 0.9971919743100659 key: test_roc_auc value: [0.95535714 0.9461039 0.92808442 0.88327922 0.93636364 0.93652597 0.90925325 0.91866883 0.94545455 0.86363636] mean value: 0.9222727272727272 key: train_roc_auc value: [0.998998 1. 0.99698996 0.99598997 0.99799599 0.99799599 0.93285366 0.99799599 0.99699399 0.998998 ] mean value: 0.9914811550812468 key: test_jcc value: [0.91666667 0.89830508 0.86666667 0.796875 0.88888889 0.88709677 0.84615385 0.85483871 0.90163934 0.7826087 ] mean value: 0.8639739676907268 key: train_jcc value: [0.99799599 1. 0.99401198 0.992 0.996 0.996 0.88014311 0.996 0.99398798 0.998 ] mean value: 0.9844139056685028 MCC on Blind test: 0.58 Accuracy on Blind test: 0.83 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.05418754 0.04911709 0.04165864 0.04315972 0.05047321 0.0487411 0.04731131 0.04764485 0.04171062 0.0402627 ] mean value: 0.04642667770385742 key: score_time value: [0.00953841 0.00912523 0.00918102 0.00916767 0.00922489 0.00934124 0.0091722 0.00921059 0.00914741 0.00930452] mean value: 0.009241318702697754 key: test_mcc value: [0.94735177 0.89188312 0.84137254 0.91368563 0.91355091 0.89414155 0.87398511 0.89414155 0.94686415 0.94686415] mean value: 0.9063840476442566 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97297297 0.94594595 0.91891892 0.95495495 0.95495495 0.94594595 0.93693694 0.94594595 0.97272727 0.97272727] mean value: 0.9522031122031123 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97345133 0.94545455 0.92173913 0.95652174 0.95726496 0.94827586 0.9380531 0.94827586 0.97345133 0.97345133] mean value: 0.9535939176068668 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94827586 0.94545455 0.88333333 0.91666667 0.91803279 0.91666667 0.92982456 0.91666667 0.94827586 0.94827586] mean value: 0.927147281328353 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.94545455 0.96363636 1. 1. 0.98214286 0.94642857 0.98214286 1. 1. ] mean value: 0.9819805194805195 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97321429 0.94594156 0.91931818 0.95535714 0.95454545 0.94561688 0.93685065 0.94561688 0.97272727 0.97272727] mean value: 0.9521915584415583 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94827586 0.89655172 0.85483871 0.91666667 0.91803279 0.90163934 0.88333333 0.90163934 0.94827586 0.94827586] mean value: 0.9117529495432083 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.66 Accuracy on Blind test: 0.87 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.17229462 0.1718049 0.17255998 0.17445517 0.17378926 0.17340136 0.17458105 0.17313957 0.17255402 0.17457175] mean value: 0.173315167427063 key: score_time value: [0.01905179 0.01922989 0.01923728 0.01930737 0.01945829 0.01919985 0.0191958 0.01988244 0.01879358 0.02033043] mean value: 0.019368672370910646 key: test_mcc value: [0.94735177 0.91003577 0.91127765 0.96396104 0.88077101 0.89704631 0.89242811 0.89242811 0.89625816 0.83984125] mean value: 0.9031399193828875 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97297297 0.95495495 0.95495495 0.98198198 0.93693694 0.94594595 0.94594595 0.94594595 0.94545455 0.91818182] mean value: 0.9503276003276003 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97345133 0.95412844 0.95575221 0.98181818 0.94117647 0.94915254 0.94736842 0.94736842 0.94827586 0.92173913] mean value: 0.952023100957829 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94827586 0.96296296 0.93103448 0.98181818 0.88888889 0.90322581 0.93103448 0.93103448 0.90163934 0.88333333] mean value: 0.9263247828062102 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.94545455 0.98181818 0.98181818 1. 1. 0.96428571 0.96428571 1. 0.96363636] mean value: 0.9801298701298702 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97321429 0.95487013 0.95519481 0.98198052 0.93636364 0.94545455 0.94577922 0.94577922 0.94545455 0.91818182] mean value: 0.9502272727272727 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94827586 0.9122807 0.91525424 0.96428571 0.88888889 0.90322581 0.9 0.9 0.90163934 0.85483871] mean value: 0.9088689264677418 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.5 Accuracy on Blind test: 0.81 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01375628 0.0127883 0.0128231 0.01299429 0.01285195 0.01358485 0.0128386 0.01285863 0.01297832 0.01316524] mean value: 0.013063955307006835 key: score_time value: [0.00929976 0.00993133 0.00923967 0.009238 0.00926399 0.00926995 0.00940943 0.00925875 0.00927234 0.00982141] mean value: 0.009400463104248047 key: test_mcc value: [0.88102763 0.77216596 0.72112155 0.84137254 0.70489656 0.75979502 0.80802876 0.82447186 0.7793831 0.69564113] mean value: 0.7787904107892152 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93693694 0.88288288 0.84684685 0.91891892 0.83783784 0.87387387 0.9009009 0.90990991 0.88181818 0.83636364] mean value: 0.8826289926289926 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94017094 0.88888889 0.864 0.92173913 0.859375 0.8852459 0.90756303 0.91525424 0.89256198 0.85483871] mean value: 0.8929637816780669 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88709677 0.83870968 0.77142857 0.88333333 0.76388889 0.81818182 0.85714286 0.87096774 0.81818182 0.76811594] mean value: 0.827704742273466 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.94545455 0.98181818 0.96363636 0.98214286 0.96428571 0.96428571 0.96428571 0.98181818 0.96363636] mean value: 0.9711363636363637 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.88344156 0.84805195 0.91931818 0.83652597 0.87305195 0.90032468 0.90941558 0.88181818 0.83636364] mean value: 0.8825811688311689 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88709677 0.8 0.76056338 0.85483871 0.75342466 0.79411765 0.83076923 0.84375 0.80597015 0.74647887] mean value: 0.8077009422008127 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.808599 2.80106878 2.77748418 2.78659678 2.8147788 2.78457665 2.77041578 2.77824712 2.79634237 2.76898098] mean value: 2.788709044456482 key: score_time value: [0.09930849 0.09889126 0.09994721 0.10435081 0.09950209 0.09872508 0.1585269 0.10039687 0.09920406 0.09858346] mean value: 0.1057436227798462 key: test_mcc value: [0.96459895 0.96396104 0.85816689 0.94735177 0.93029809 0.89704631 0.87508299 0.91003577 0.87988269 0.84322091] mean value: 0.9069645418161864 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98198198 0.98198198 0.92792793 0.97297297 0.96396396 0.94594595 0.93693694 0.95495495 0.93636364 0.91818182] mean value: 0.9521212121212121 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98214286 0.98181818 0.92982456 0.97345133 0.96551724 0.94915254 0.93913043 0.95575221 0.94017094 0.92307692] mean value: 0.954003722197022 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96491228 0.98181818 0.89830508 0.94827586 0.93333333 0.90322581 0.91525424 0.94736842 0.88709677 0.87096774] mean value: 0.925055772358941 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.98181818 0.96363636 1. 1. 1. 0.96428571 0.96428571 1. 0.98181818] mean value: 0.9855844155844156 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.98198052 0.92824675 0.97321429 0.96363636 0.94545455 0.93668831 0.95487013 0.93636364 0.91818182] mean value: 0.9520779220779221 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96491228 0.96428571 0.86885246 0.94827586 0.93333333 0.90322581 0.8852459 0.91525424 0.88709677 0.85714286] mean value: 0.912762522612166 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.72 Accuracy on Blind test: 0.89 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.13378525 1.13587284 1.13098383 1.13388467 1.16668534 1.18749261 1.24799681 1.14120936 1.1288693 1.20383883] mean value: 1.1610618829727173 key: score_time value: [0.23726964 0.2293191 0.28423762 0.23544788 0.28838253 0.29559493 0.28117895 0.27506924 0.28288841 0.22672176] mean value: 0.26361100673675536 key: test_mcc value: [0.9461039 0.91006494 0.82205752 0.89188312 0.93029809 0.85798501 0.91355091 0.89242811 0.83984125 0.87635609] mean value: 0.888056893772736 key: train_mcc value: [0.96427099 0.95418486 0.95040062 0.95819837 0.96025947 0.94616064 0.96221384 0.95213257 0.95209501 0.95824107] mean value: 0.9558157467241123 key: test_accuracy value: [0.97297297 0.95495495 0.90990991 0.94594595 0.96396396 0.92792793 0.95495495 0.94594595 0.91818182 0.93636364] mean value: 0.9431122031122031 key: train_accuracy value: [0.98194584 0.97693079 0.97492477 0.97893681 0.97993982 0.97291876 0.98094283 0.97592778 0.9759519 0.97895792] mean value: 0.9777377221845899 key: test_fscore value: [0.97297297 0.95495495 0.9122807 0.94545455 0.96551724 0.93103448 0.95726496 0.94736842 0.92173913 0.93913043] mean value: 0.9447717842809771 key: train_fscore value: [0.98221344 0.97725025 0.97536946 0.97922849 0.98019802 0.97324083 0.98116947 0.97619048 0.97619048 0.97922849] mean value: 0.9780279396854765 key: test_precision value: [0.96428571 0.94642857 0.88135593 0.94545455 0.93333333 0.9 0.91803279 0.93103448 0.88333333 0.9 ] mean value: 0.9203258699682755 key: train_precision value: [0.96881092 0.96484375 0.95930233 0.96679688 0.96679688 0.96086106 0.96868885 0.96470588 0.96660118 0.96679688] mean value: 0.9654204580048241 key: test_recall value: [0.98181818 0.96363636 0.94545455 0.94545455 1. 0.96428571 1. 0.96428571 0.96363636 0.98181818] mean value: 0.971038961038961 key: train_recall value: [0.99599198 0.98997996 0.99198397 0.99198397 0.9939759 0.98594378 0.9939759 0.98795181 0.98597194 0.99198397] mean value: 0.9909743181141399 key: test_roc_auc value: [0.97305195 0.95503247 0.91022727 0.94594156 0.96363636 0.9275974 0.95454545 0.94577922 0.91818182 0.93636364] mean value: 0.9430357142857143 key: train_roc_auc value: [0.98193173 0.97691769 0.97490765 0.97892371 0.97995388 0.97293181 0.98095589 0.97593983 0.9759519 0.97895792] mean value: 0.977737201310251 key: test_jcc value: [0.94736842 0.9137931 0.83870968 0.89655172 0.93333333 0.87096774 0.91803279 0.9 0.85483871 0.8852459 ] mean value: 0.8958841399529021 key: train_jcc value: [0.96504854 0.95551257 0.95192308 0.95930233 0.96116505 0.94787645 0.96303502 0.95348837 0.95348837 0.95930233] mean value: 0.9570142104370474 MCC on Blind test: 0.78 Accuracy on Blind test: 0.91 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02807689 0.01736927 0.01711869 0.01721525 0.01714611 0.01708817 0.01749229 0.01700449 0.01726103 0.01719761] mean value: 0.018296980857849122 key: score_time value: [0.01289463 0.01240635 0.01258349 0.01251745 0.01250124 0.01252055 0.01249766 0.01253819 0.01249957 0.01250172] mean value: 0.012546086311340332 key: test_mcc value: [0.60674852 0.53168696 0.51517746 0.56873266 0.71205754 0.51517746 0.67598342 0.58620801 0.73323558 0.56400939] mean value: 0.6009016993199549 key: train_mcc value: [0.63491093 0.63543061 0.62142983 0.61525929 0.6391103 0.66099253 0.62317718 0.65531326 0.65761181 0.63728607] mean value: 0.6380521801166391 key: test_accuracy value: [0.8018018 0.76576577 0.75675676 0.78378378 0.85585586 0.75675676 0.83783784 0.79279279 0.86363636 0.78181818] mean value: 0.7996805896805896 key: train_accuracy value: [0.81745236 0.81745236 0.81043129 0.80742227 0.81945838 0.83049147 0.8114343 0.82748245 0.82865731 0.81863727] mean value: 0.8188919463802228 key: test_fscore value: [0.78846154 0.75925926 0.74285714 0.77358491 0.85964912 0.76923077 0.84210526 0.8 0.87179487 0.78571429] mean value: 0.7992657158943157 key: train_fscore value: [0.81726908 0.81390593 0.80655067 0.80408163 0.82142857 0.82980866 0.80816327 0.83003953 0.83119447 0.81809045] mean value: 0.818053225190839 key: test_precision value: [0.83673469 0.77358491 0.78 0.80392157 0.84482759 0.73770492 0.82758621 0.77966102 0.82258065 0.77192982] mean value: 0.7978531365973461 key: train_precision value: [0.81891348 0.8308977 0.82426778 0.81912682 0.81176471 0.83232323 0.82157676 0.81712062 0.81906615 0.82056452] mean value: 0.821562177423608 key: test_recall value: [0.74545455 0.74545455 0.70909091 0.74545455 0.875 0.80357143 0.85714286 0.82142857 0.92727273 0.8 ] mean value: 0.802987012987013 key: train_recall value: [0.81563126 0.79759519 0.78957916 0.78957916 0.8313253 0.82730924 0.79518072 0.84337349 0.84368737 0.81563126] mean value: 0.8148892161833707 key: test_roc_auc value: [0.8012987 0.76558442 0.75633117 0.78344156 0.85568182 0.75633117 0.83766234 0.79253247 0.86363636 0.78181818] mean value: 0.7994318181818182 key: train_roc_auc value: [0.81745419 0.81747229 0.81045223 0.80744018 0.81947027 0.83048829 0.81141802 0.82749837 0.82865731 0.81863727] mean value: 0.8188988418604277 key: test_jcc value: [0.65079365 0.6119403 0.59090909 0.63076923 0.75384615 0.625 0.72727273 0.66666667 0.77272727 0.64705882] mean value: 0.6676983915021667 key: train_jcc value: [0.6910017 0.6862069 0.67581475 0.67235495 0.6969697 0.7091222 0.67808219 0.70945946 0.71114865 0.69217687] mean value: 0.6922337365141537 MCC on Blind test: 0.5 Accuracy on Blind test: 0.79 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.16017747 0.1352632 0.1334641 0.28510332 0.13021255 0.1371038 0.13196158 0.13735604 0.136549 0.13277578] mean value: 0.15199668407440187 key: score_time value: [0.01133108 0.01133108 0.01121116 0.01118255 0.01125717 0.01123095 0.01119471 0.01144171 0.01123309 0.01124144] mean value: 0.01126549243927002 key: test_mcc value: [0.94735177 0.94735177 0.91127765 0.91368563 0.91355091 0.87733514 0.91119237 0.96457634 0.87988269 0.91287093] mean value: 0.917907519038506 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97297297 0.97297297 0.95495495 0.95495495 0.95495495 0.93693694 0.95495495 0.98198198 0.93636364 0.95454545] mean value: 0.9575593775593776 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97345133 0.97345133 0.95575221 0.95652174 0.95726496 0.94017094 0.95652174 0.98245614 0.94017094 0.95652174] mean value: 0.9592283062605657 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94827586 0.94827586 0.93103448 0.91666667 0.91803279 0.90163934 0.93220339 0.96551724 0.88709677 0.91666667] mean value: 0.9265409076780793 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.98181818 1. 1. 0.98214286 0.98214286 1. 1. 1. ] mean value: 0.9946103896103896 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97321429 0.97321429 0.95519481 0.95535714 0.95454545 0.93652597 0.95470779 0.98181818 0.93636364 0.95454545] mean value: 0.9575487012987013 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94827586 0.94827586 0.91525424 0.91666667 0.91803279 0.88709677 0.91666667 0.96551724 0.88709677 0.91666667] mean value: 0.9219549538077719 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.73 Accuracy on Blind test: 0.89 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05546999 0.08607674 0.0726912 0.08219934 0.06160903 0.08801007 0.0710063 0.07505107 0.10596418 0.08573651] mean value: 0.07838144302368164 key: score_time value: [0.01859093 0.01929092 0.01232386 0.01251221 0.01444817 0.01904821 0.01238537 0.01876664 0.04351854 0.02110696] mean value: 0.019199180603027343 key: test_mcc value: [0.73090707 0.69483296 0.65875884 0.67619361 0.76868784 0.74772727 0.79177679 0.62617314 0.72739297 0.6634888 ] mean value: 0.7085939295611897 key: train_mcc value: [0.83152863 0.79823275 0.82348413 0.82612027 0.80931961 0.82116323 0.81783952 0.80406528 0.81161814 0.82958324] mean value: 0.817295481744602 key: test_accuracy value: [0.86486486 0.84684685 0.82882883 0.83783784 0.88288288 0.87387387 0.89189189 0.81081081 0.86363636 0.82727273] mean value: 0.8528746928746929 key: train_accuracy value: [0.91474423 0.89869609 0.9107322 0.91173521 0.90371113 0.90972919 0.90772317 0.90170512 0.90480962 0.91382766] mean value: 0.9077413603536059 key: test_fscore value: [0.86725664 0.84955752 0.83185841 0.83928571 0.88888889 0.875 0.9 0.82352941 0.86238532 0.84033613] mean value: 0.8578098036865689 key: train_fscore value: [0.91771539 0.90107738 0.91384318 0.91522158 0.90679612 0.91245136 0.91102515 0.90354331 0.90803485 0.91666667] mean value: 0.9106374969508796 key: test_precision value: [0.84482759 0.82758621 0.81034483 0.8245614 0.85245902 0.875 0.84375 0.77777778 0.87037037 0.78125 ] mean value: 0.8307927188740017 key: train_precision value: [0.88764045 0.88122605 0.88389513 0.8812616 0.87781955 0.88490566 0.87873134 0.88610039 0.87827715 0.88742964] mean value: 0.8827286965430265 key: test_recall value: [0.89090909 0.87272727 0.85454545 0.85454545 0.92857143 0.875 0.96428571 0.875 0.85454545 0.90909091] mean value: 0.8879220779220779 key: train_recall value: [0.9498998 0.92184369 0.94589178 0.95190381 0.937751 0.94176707 0.94578313 0.92168675 0.93987976 0.94789579] mean value: 0.9404302581065745 key: test_roc_auc value: [0.8650974 0.84707792 0.82905844 0.83798701 0.88246753 0.87386364 0.89123377 0.81022727 0.86363636 0.82727273] mean value: 0.8527922077922078 key: train_roc_auc value: [0.91470894 0.89867285 0.9106969 0.91169488 0.90374524 0.90976129 0.90776131 0.90172514 0.90480962 0.91382766] mean value: 0.9077403803591118 key: test_jcc value: [0.765625 0.73846154 0.71212121 0.72307692 0.8 0.77777778 0.81818182 0.7 0.75806452 0.72463768] mean value: 0.7517946466907722 key: train_jcc value: [0.84794275 0.81996435 0.84135472 0.84369449 0.8294849 0.83899821 0.8365897 0.82405745 0.83156028 0.84615385] mean value: 0.8359800713703212 MCC on Blind test: 0.59 Accuracy on Blind test: 0.83 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02062082 0.01450729 0.01637959 0.01642632 0.01640606 0.01634121 0.01633549 0.01689839 0.01637173 0.0293107 ] mean value: 0.01795976161956787 key: score_time value: [0.01051903 0.0122385 0.01214838 0.0124464 0.01239777 0.01236987 0.01209283 0.01214767 0.01245856 0.01252127] mean value: 0.012134027481079102 key: test_mcc value: [0.60383519 0.55139323 0.4778799 0.56873266 0.74772727 0.53149351 0.6576811 0.60540128 0.63678479 0.52762168] mean value: 0.5908550608068347 key: train_mcc value: [0.59486707 0.60893242 0.60503014 0.59892457 0.58902021 0.60720182 0.58895544 0.59886665 0.60359924 0.59950665] mean value: 0.5994904218517351 key: test_accuracy value: [0.8018018 0.77477477 0.73873874 0.78378378 0.87387387 0.76576577 0.82882883 0.8018018 0.81818182 0.76363636] mean value: 0.7951187551187551 key: train_accuracy value: [0.79739218 0.80441324 0.80240722 0.79939819 0.79438315 0.80341023 0.79438315 0.79939819 0.80160321 0.7995992 ] mean value: 0.799638796147963 key: test_fscore value: [0.7962963 0.76190476 0.72897196 0.77358491 0.875 0.76785714 0.83185841 0.7962963 0.82142857 0.76785714] mean value: 0.7921055486997057 key: train_fscore value: [0.7959596 0.80283114 0.8 0.79757085 0.79102956 0.799591 0.79145473 0.79757085 0.79795918 0.79633401] mean value: 0.7970300928959978 key: test_precision value: [0.81132075 0.8 0.75 0.80392157 0.875 0.76785714 0.8245614 0.82692308 0.80701754 0.75438596] mean value: 0.8020987455405354 key: train_precision value: [0.80244399 0.81020408 0.81069959 0.80572597 0.80331263 0.81458333 0.80206186 0.80408163 0.81288981 0.80952381] mean value: 0.8075526706803229 key: test_recall value: [0.78181818 0.72727273 0.70909091 0.74545455 0.875 0.76785714 0.83928571 0.76785714 0.83636364 0.78181818] mean value: 0.7831818181818182 key: train_recall value: [0.78957916 0.79559118 0.78957916 0.78957916 0.77911647 0.78514056 0.7811245 0.79116466 0.78356713 0.78356713] mean value: 0.7868009110590659 key: test_roc_auc value: [0.80162338 0.77435065 0.73847403 0.78344156 0.87386364 0.76574675 0.82873377 0.80211039 0.81818182 0.76363636] mean value: 0.7950162337662338 key: train_roc_auc value: [0.79740002 0.8044221 0.8024201 0.79940805 0.79436785 0.80339192 0.79436986 0.79938994 0.80160321 0.7995992 ] mean value: 0.7996372262597484 key: test_jcc value: [0.66153846 0.61538462 0.57352941 0.63076923 0.77777778 0.62318841 0.71212121 0.66153846 0.6969697 0.62318841] mean value: 0.6576005679458365 key: train_jcc value: [0.66107383 0.67060811 0.66666667 0.66329966 0.65430017 0.66609881 0.65488215 0.66329966 0.66383701 0.66159052] mean value: 0.6625656594308654 MCC on Blind test: 0.48 Accuracy on Blind test: 0.78 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03798008 0.03545737 0.03012061 0.03877425 0.0317688 0.03150177 0.03185892 0.03116226 0.02460456 0.03471065] mean value: 0.03279392719268799 key: score_time value: [0.01224113 0.01254487 0.01250696 0.01257157 0.01247406 0.01254869 0.01248217 0.01216841 0.01248527 0.01256871] mean value: 0.012459182739257812 key: test_mcc value: [0.66004053 0.72396756 0.19509708 0.73247207 0.82027988 0.4414112 0.82027988 0.38670108 0.58911518 0.59648483] mean value: 0.5965849292121154 key: train_mcc value: [0.78537137 0.78847029 0.1796513 0.78565201 0.79562144 0.52187278 0.78153894 0.52435675 0.67276339 0.72676811] mean value: 0.6562066371914825 key: test_accuracy value: [0.82882883 0.85585586 0.54054054 0.86486486 0.90990991 0.67567568 0.90990991 0.64864865 0.78181818 0.78181818] mean value: 0.7797870597870598 key: train_accuracy value: [0.89267803 0.89067202 0.53259779 0.89167503 0.89769308 0.72316951 0.89067202 0.7221665 0.82364729 0.85571142] mean value: 0.8120682689350617 key: test_fscore value: [0.81904762 0.86666667 0.13559322 0.85714286 0.9122807 0.75342466 0.9122807 0.49350649 0.74468085 0.8125 ] mean value: 0.7307123768809468 key: train_fscore value: [0.89246231 0.89765258 0.12734082 0.8875 0.89880952 0.77990431 0.8917577 0.62106703 0.79582367 0.86909091] mean value: 0.766140885029211 key: test_precision value: [0.86 0.8 1. 0.9 0.89655172 0.61111111 0.89655172 0.9047619 0.8974359 0.71232877] mean value: 0.8478741128708063 key: train_precision value: [0.89516129 0.84452297 0.97142857 0.92407809 0.88823529 0.6468254 0.88212181 0.97424893 0.94490358 0.7953411 ] mean value: 0.8766867025939546 key: test_recall value: [0.78181818 0.94545455 0.07272727 0.81818182 0.92857143 0.98214286 0.92857143 0.33928571 0.63636364 0.94545455] mean value: 0.7378571428571429 key: train_recall value: [0.88977956 0.95791583 0.06813627 0.85370741 0.90963855 0.98192771 0.90160643 0.45582329 0.68737475 0.95791583] mean value: 0.766382564325438 key: test_roc_auc value: [0.82840909 0.85665584 0.53636364 0.86444805 0.90974026 0.67288961 0.90974026 0.65146104 0.78181818 0.78181818] mean value: 0.7793344155844155 key: train_roc_auc value: [0.89268094 0.8906045 0.53306412 0.89171315 0.89770505 0.72342879 0.89068297 0.72189962 0.82364729 0.85571142] mean value: 0.8121137858045409 key: test_jcc value: [0.69354839 0.76470588 0.07272727 0.75 0.83870968 0.6043956 0.83870968 0.32758621 0.59322034 0.68421053] mean value: 0.6167813573606694 key: train_jcc value: [0.80580762 0.81431005 0.068 0.79775281 0.81621622 0.63921569 0.8046595 0.45039683 0.66088632 0.76848875] mean value: 0.6625733774522629 MCC on Blind test: 0.47 Accuracy on Blind test: 0.67 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.04141402 0.03731918 0.03793001 0.03758025 0.03285313 0.03277278 0.03082371 0.05231643 0.0315969 0.02983832] mean value: 0.036444473266601565 key: score_time value: [0.01278472 0.01252007 0.01252818 0.01277971 0.01252961 0.01251364 0.0123961 0.01258612 0.01246643 0.01259232] mean value: 0.012569689750671386 key: test_mcc value: [0.72978244 0.69891539 0.74772727 0.61746362 0.75009679 0.78434561 0.7964953 0.77224584 0.76477489 0.67451348] mean value: 0.7336360627410452 key: train_mcc value: [0.80907859 0.75937672 0.83892384 0.67525521 0.74664046 0.82363952 0.74461161 0.79763036 0.80597545 0.75551102] mean value: 0.7756642761750585 key: test_accuracy value: [0.86486486 0.84684685 0.87387387 0.8018018 0.86486486 0.89189189 0.89189189 0.87387387 0.88181818 0.83636364] mean value: 0.8628091728091728 key: train_accuracy value: [0.90371113 0.87462387 0.91875627 0.82848546 0.86459378 0.91173521 0.86258776 0.89267803 0.90280561 0.87274549] mean value: 0.883272261674804 key: test_fscore value: [0.86238532 0.83495146 0.87272727 0.7755102 0.88 0.89090909 0.90163934 0.88888889 0.88495575 0.83018868] mean value: 0.862215600973845 key: train_fscore value: [0.90679612 0.86368593 0.9211295 0.80634202 0.87760653 0.91252485 0.87646528 0.90120037 0.90424482 0.86150491] mean value: 0.8831500324767252 key: test_precision value: [0.87037037 0.89583333 0.87272727 0.88372093 0.79710145 0.90740741 0.83333333 0.8 0.86206897 0.8627451 ] mean value: 0.8585308160236095 key: train_precision value: [0.87947269 0.94736842 0.89583333 0.92708333 0.8 0.90354331 0.79541735 0.83418803 0.89105058 0.94497608] mean value: 0.8818933130847411 key: test_recall value: [0.85454545 0.78181818 0.87272727 0.69090909 0.98214286 0.875 0.98214286 1. 0.90909091 0.8 ] mean value: 0.8748376623376624 key: train_recall value: [0.93587174 0.79358717 0.94789579 0.71342685 0.97188755 0.92168675 0.97590361 0.97991968 0.91783567 0.79158317] mean value: 0.894959799116305 key: test_roc_auc value: [0.86477273 0.84626623 0.87386364 0.80081169 0.8637987 0.89204545 0.89107143 0.87272727 0.88181818 0.83636364] mean value: 0.8623538961038961 key: train_roc_auc value: [0.90367884 0.87470523 0.91872701 0.82860098 0.86470129 0.91174518 0.86270131 0.89276545 0.90280561 0.87274549] mean value: 0.8833176392946536 key: test_jcc value: [0.75806452 0.71666667 0.77419355 0.63333333 0.78571429 0.80327869 0.82089552 0.8 0.79365079 0.70967742] mean value: 0.7595474774148697 key: train_jcc value: [0.8294849 0.76007678 0.85379061 0.67552182 0.7819063 0.83912249 0.78009631 0.82016807 0.82522523 0.75670498] mean value: 0.7922097481345935 MCC on Blind test: 0.63 Accuracy on Blind test: 0.83 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.3187921 0.29844427 0.30731273 0.29335833 0.29589033 0.29770684 0.29465437 0.29198313 0.29960108 0.28603816] mean value: 0.29837813377380373 key: score_time value: [0.0161016 0.01719046 0.01659346 0.01645303 0.01635051 0.01641083 0.01618958 0.01644993 0.01580024 0.0157938 ] mean value: 0.016333341598510742 key: test_mcc value: [0.89249761 0.85584416 0.87398511 0.82480596 0.86471225 0.84111937 0.85798501 0.93029809 0.85967619 0.87635609] mean value: 0.8677279843941925 key: train_mcc value: [0.93607708 0.93628772 0.93240093 0.94638611 0.94833373 0.94431975 0.94237551 0.92853373 0.9465782 0.94425948] mean value: 0.9405552228633982 key: test_accuracy value: [0.94594595 0.92792793 0.93693694 0.90990991 0.92792793 0.91891892 0.92792793 0.96396396 0.92727273 0.93636364] mean value: 0.9323095823095823 key: train_accuracy value: [0.96790371 0.96790371 0.96589769 0.97291876 0.97392177 0.97191575 0.97091274 0.96389168 0.97294589 0.97194389] mean value: 0.9700155576951295 key: test_fscore value: [0.94642857 0.92727273 0.93577982 0.9137931 0.93333333 0.92307692 0.93103448 0.96551724 0.93103448 0.93913043] mean value: 0.9346401116752753 key: train_fscore value: [0.96831683 0.96844181 0.96653543 0.97339901 0.9743083 0.97233202 0.97137216 0.96456693 0.97345133 0.97233202] mean value: 0.9705055844606678 key: test_precision value: [0.92982456 0.92727273 0.94444444 0.86885246 0.875 0.8852459 0.9 0.93333333 0.8852459 0.9 ] mean value: 0.9049219328749096 key: train_precision value: [0.95694716 0.95339806 0.94970986 0.95736434 0.95914397 0.95719844 0.95533981 0.94594595 0.95559846 0.95906433] mean value: 0.9549710373674181 key: test_recall value: [0.96363636 0.92727273 0.92727273 0.96363636 1. 0.96428571 0.96428571 1. 0.98181818 0.98181818] mean value: 0.9674025974025974 key: train_recall value: [0.97995992 0.98396794 0.98396794 0.98997996 0.98995984 0.98795181 0.98795181 0.98393574 0.99198397 0.98597194] mean value: 0.9865630860113802 key: test_roc_auc value: [0.9461039 0.92792208 0.93685065 0.91038961 0.92727273 0.91850649 0.9275974 0.96363636 0.92727273 0.93636364] mean value: 0.9321915584415584 key: train_roc_auc value: [0.96789161 0.96788758 0.96587955 0.97290163 0.97393784 0.97193182 0.97092981 0.96391176 0.97294589 0.97194389] mean value: 0.9700161366910527 key: test_jcc value: [0.89830508 0.86440678 0.87931034 0.84126984 0.875 0.85714286 0.87096774 0.93333333 0.87096774 0.8852459 ] mean value: 0.877594962649071 key: train_jcc value: [0.93857965 0.93881453 0.9352381 0.94817658 0.94990366 0.94615385 0.94433781 0.93155894 0.94827586 0.94615385] mean value: 0.9427192827315077 MCC on Blind test: 0.8 Accuracy on Blind test: 0.92 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.22250032 0.21884203 0.22811842 0.22957635 0.23531556 0.22903347 0.20518231 0.22516227 0.24410057 0.22989869] mean value: 0.22677299976348878 key: score_time value: [0.04108667 0.02354312 0.03398275 0.04121399 0.03230572 0.03941464 0.02867675 0.03275776 0.03820419 0.04190421] mean value: 0.03530898094177246 key: test_mcc value: [0.92854828 0.92854828 0.87520354 0.89249761 0.86471225 0.89704631 0.856354 0.94730174 0.86373129 0.87988269] mean value: 0.8933825983815884 key: train_mcc value: [0.98998777 0.99198387 0.99599596 0.98398339 0.99799599 0.98997191 0.99399998 0.99399998 0.99002966 0.99400594] mean value: 0.9921954443123457 key: test_accuracy value: [0.96396396 0.96396396 0.93693694 0.94594595 0.92792793 0.94594595 0.92792793 0.97297297 0.92727273 0.93636364] mean value: 0.9449221949221949 key: train_accuracy value: [0.99498495 0.99598796 0.99799398 0.99197593 0.99899699 0.99498495 0.99699097 0.99699097 0.99498998 0.99699399] mean value: 0.9960890688096353 key: test_fscore value: [0.96428571 0.96428571 0.9380531 0.94642857 0.93333333 0.94915254 0.92982456 0.97391304 0.93220339 0.94017094] mean value: 0.9471650907934566 key: train_fscore value: [0.995005 0.996 0.998 0.99201597 0.99899699 0.99498495 0.996997 0.996997 0.99501496 0.997003 ] mean value: 0.9961014855037967 key: test_precision value: [0.94736842 0.94736842 0.9137931 0.92982456 0.875 0.90322581 0.9137931 0.94915254 0.87301587 0.88709677] mean value: 0.913963860643924 key: train_precision value: [0.99203187 0.99401198 0.99600798 0.98807157 0.99799599 0.99398798 0.99401198 0.99401198 0.99007937 0.9940239 ] mean value: 0.9934234592659856 key: test_recall value: [0.98181818 0.98181818 0.96363636 0.96363636 1. 1. 0.94642857 1. 1. 1. ] mean value: 0.9837337662337662 key: train_recall value: [0.99799599 0.99799599 1. 0.99599198 1. 0.99598394 1. 1. 1. 1. ] mean value: 0.9987967903678844 key: test_roc_auc value: [0.96412338 0.96412338 0.93717532 0.9461039 0.92727273 0.94545455 0.92775974 0.97272727 0.92727273 0.93636364] mean value: 0.9448376623376623 key: train_roc_auc value: [0.99498193 0.99598595 0.99799197 0.9919719 0.998998 0.99498596 0.99699399 0.99699399 0.99498998 0.99699399] mean value: 0.9960887638731277 key: test_jcc value: [0.93103448 0.93103448 0.88333333 0.89830508 0.875 0.90322581 0.86885246 0.94915254 0.87301587 0.88709677] mean value: 0.9000050838646646 key: train_jcc value: [0.99005964 0.99203187 0.99600798 0.98415842 0.99799599 0.99001996 0.99401198 0.99401198 0.99007937 0.9940239 ] mean value: 0.992240108815205 MCC on Blind test: 0.73 Accuracy on Blind test: 0.89 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.45735979 0.48779058 0.38904047 0.40051913 0.49239159 0.37844515 0.39243364 0.41860628 0.4010551 0.42596316] mean value: 0.42436048984527586 key: score_time value: [0.02394366 0.04344606 0.02420187 0.03411937 0.04596186 0.03702545 0.03489757 0.02422857 0.02380872 0.03269696] mean value: 0.03243300914764404 key: test_mcc value: [0.86504296 0.85816689 0.805216 0.82480596 0.81228039 0.81228039 0.75979502 0.73528651 0.78590525 0.75346772] mean value: 0.8012247103768273 key: train_mcc value: [0.97605307 0.97408004 0.97219678 0.96221188 0.96613365 0.97211189 0.97006835 0.96417189 0.97410635 0.97800501] mean value: 0.9709138904055334 key: test_accuracy value: [0.92792793 0.92792793 0.9009009 0.90990991 0.9009009 0.9009009 0.87387387 0.86486486 0.88181818 0.87272727] mean value: 0.8961752661752662 key: train_accuracy value: [0.98796389 0.98696088 0.98595787 0.98094283 0.98294885 0.98595787 0.98495486 0.98194584 0.98697395 0.98897796] mean value: 0.9853584802503703 key: test_fscore value: [0.93220339 0.92982456 0.90434783 0.9137931 0.90909091 0.90909091 0.8852459 0.87394958 0.89430894 0.88135593] mean value: 0.9033211055715166 key: train_fscore value: [0.98807157 0.98709037 0.98613861 0.98120673 0.98311817 0.9860835 0.98507463 0.98214286 0.98709037 0.9890329 ] mean value: 0.9855049702408853 key: test_precision value: [0.87301587 0.89830508 0.86666667 0.86885246 0.84615385 0.84615385 0.81818182 0.82539683 0.80882353 0.82539683] mean value: 0.8476946774139622 key: train_precision value: [0.98027613 0.97834646 0.97455969 0.96875 0.97249509 0.97637795 0.97633136 0.97058824 0.97834646 0.98412698] mean value: 0.9760198355928966 key: test_recall value: [1. 0.96363636 0.94545455 0.96363636 0.98214286 0.98214286 0.96428571 0.92857143 1. 0.94545455] mean value: 0.9675324675324675 key: train_recall value: [0.99599198 0.99599198 0.99799599 0.99398798 0.9939759 0.99598394 0.9939759 0.9939759 0.99599198 0.99398798] mean value: 0.9951859542377929 key: test_roc_auc value: [0.92857143 0.92824675 0.9012987 0.91038961 0.90016234 0.90016234 0.87305195 0.86428571 0.88181818 0.87272727] mean value: 0.8960714285714286 key: train_roc_auc value: [0.98795583 0.98695182 0.98594579 0.98092973 0.9829599 0.98596792 0.9849639 0.98195789 0.98697395 0.98897796] mean value: 0.9853584679398959 key: test_jcc value: [0.87301587 0.86885246 0.82539683 0.84126984 0.83333333 0.83333333 0.79411765 0.7761194 0.80882353 0.78787879] mean value: 0.824214103270005 key: train_jcc value: [0.97642436 0.9745098 0.97265625 0.9631068 0.96679688 0.97254902 0.97058824 0.96491228 0.9745098 0.97830375] mean value: 0.9714357173590997 MCC on Blind test: 0.49 Accuracy on Blind test: 0.8 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [1.30451274 1.32792377 1.30644488 1.30304623 1.33850408 1.34490895 1.34416556 1.32531881 1.29144812 1.29360223] mean value: 1.3179875373840333 key: score_time value: [0.00983834 0.01067233 0.01020646 0.01000905 0.01045823 0.01032877 0.01025462 0.00977778 0.00993848 0.00966263] mean value: 0.010114669799804688 key: test_mcc value: [0.94735177 0.93038564 0.91127765 0.8972375 0.88077101 0.87733514 0.89414155 0.93029809 0.86373129 0.91287093] mean value: 0.9045400572930724 key: train_mcc value: [0.98803531 0.9900196 0.9900196 0.9900196 0.98605528 0.98803559 0.98803559 0.9900198 0.98409441 0.99201584] mean value: 0.9886350623024839 key: test_accuracy value: [0.97297297 0.96396396 0.95495495 0.94594595 0.93693694 0.93693694 0.94594595 0.96396396 0.92727273 0.95454545] mean value: 0.9503439803439804 key: train_accuracy value: [0.99398195 0.99498495 0.99498495 0.99498495 0.99297894 0.99398195 0.99398195 0.99498495 0.99198397 0.99599198] mean value: 0.9942840545685152 key: test_fscore value: [0.97345133 0.96491228 0.95575221 0.94827586 0.94117647 0.94017094 0.94827586 0.96551724 0.93220339 0.95652174] mean value: 0.9526257325762123 key: train_fscore value: [0.9940239 0.99501496 0.99501496 0.99501496 0.99302094 0.99401198 0.99401198 0.995005 0.99204771 0.99600798] mean value: 0.9943174351825127 key: test_precision value: [0.94827586 0.93220339 0.93103448 0.90163934 0.88888889 0.90163934 0.91666667 0.93333333 0.87301587 0.91666667] mean value: 0.9143363851754114 key: train_precision value: [0.98811881 0.99007937 0.99007937 0.99007937 0.98613861 0.98809524 0.98809524 0.99005964 0.98422091 0.99204771] mean value: 0.9887014260333787 key: test_recall value: [1. 1. 0.98181818 1. 1. 0.98214286 0.98214286 1. 1. 1. ] mean value: 0.9946103896103896 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97321429 0.96428571 0.95519481 0.94642857 0.93636364 0.93652597 0.94561688 0.96363636 0.92727273 0.95454545] mean value: 0.9503084415584415 key: train_roc_auc value: [0.9939759 0.99497992 0.99497992 0.99497992 0.99298597 0.99398798 0.99398798 0.99498998 0.99198397 0.99599198] mean value: 0.9942843518362025 key: test_jcc value: [0.94827586 0.93220339 0.91525424 0.90163934 0.88888889 0.88709677 0.90163934 0.93333333 0.87301587 0.91666667] mean value: 0.909801371381051 key: train_jcc value: [0.98811881 0.99007937 0.99007937 0.99007937 0.98613861 0.98809524 0.98809524 0.99005964 0.98422091 0.99204771] mean value: 0.9887014260333787 MCC on Blind test: 0.77 Accuracy on Blind test: 0.91 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.04034972 0.04601431 0.0428679 0.04311156 0.05195618 0.05159187 0.04370213 0.04357862 0.04359293 0.05074596] mean value: 0.045751118659973146 key: score_time value: [0.01296997 0.01311469 0.01335645 0.01352835 0.01321268 0.01309395 0.01326084 0.01323462 0.01333642 0.01326323] mean value: 0.0132371187210083 key: test_mcc value: [0.37650043 0.4359956 0.40253236 0.53411996 0.38789039 0.41128197 0.3513061 0.3513061 0.34992711 0.33333333] mean value: 0.3934193366468438 key: train_mcc value: [0.37027674 0.37377607 0.38245279 0.55233529 0.50102346 0.45326411 0.3885448 0.39195242 0.37666488 0.38184999] mean value: 0.41721405362271696 key: test_accuracy value: [0.62162162 0.65765766 0.64864865 0.72972973 0.66666667 0.66666667 0.61261261 0.61261261 0.60909091 0.6 ] mean value: 0.6425307125307125 key: train_accuracy value: [0.62086259 0.62286861 0.62788365 0.7442327 0.70611836 0.67402207 0.63089268 0.6328987 0.6242485 0.62725451] mean value: 0.6511282344026066 key: test_fscore value: [0.72368421 0.74324324 0.73469388 0.7826087 0.73758865 0.74482759 0.72258065 0.72258065 0.71895425 0.71428571] mean value: 0.7345047518636227 key: train_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [0.7252907 0.72634643 0.72899927 0.79285134 0.77091478 0.75285171 0.73020528 0.73127753 0.72687546 0.72846715] mean value: 0.7414079649678472 key: test_precision value: [0.56701031 0.59139785 0.58695652 0.65060241 0.61176471 0.60674157 0.56565657 0.56565657 0.56122449 0.55555556] mean value: 0.5862566545699067 key: train_precision value: [0.56898518 0.57028571 0.57356322 0.66666667 0.631242 0.60587515 0.57505774 0.57638889 0.57093822 0.57290471] mean value: 0.5911907474465508 key: test_recall value: [1. 1. 0.98181818 0.98181818 0.92857143 0.96428571 1. 1. 1. 1. ] mean value: 0.9856493506493507 key: train_recall value: [1. 1. 1. 0.97795591 0.98995984 0.9939759 1. 1. 1. 1. ] mean value: 0.9961891654795535 key: test_roc_auc value: [0.625 0.66071429 0.65162338 0.73198052 0.66428571 0.66396104 0.60909091 0.60909091 0.60909091 0.6 ] mean value: 0.6424837662337662 key: train_roc_auc value: [0.62048193 0.62248996 0.62751004 0.74399804 0.70640277 0.67434266 0.63126253 0.63326653 0.6242485 0.62725451] mean value: 0.6511257454668373 key: test_jcc value: [0.56701031 0.59139785 0.58064516 0.64285714 0.58426966 0.59340659 0.56565657 0.56565657 0.56122449 0.55555556] mean value: 0.5807679895880729 key: train_jcc value: [0.56898518 0.57028571 0.57356322 0.65679677 0.62722646 0.60365854 0.57505774 0.57638889 0.57093822 0.57290471] mean value: 0.5895805426902528 MCC on Blind test: 0.21 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02735806 0.04222441 0.04203415 0.04227781 0.0215807 0.03210664 0.01891565 0.03945494 0.05434728 0.04150367] mean value: 0.03618032932281494 key: score_time value: [0.01939583 0.02228022 0.01953959 0.01950502 0.01263547 0.01240468 0.01373148 0.01946378 0.0194211 0.02454901] mean value: 0.018292617797851563 key: test_mcc value: [0.78376623 0.75237443 0.67619361 0.80305531 0.8049036 0.74951538 0.83897362 0.74951538 0.76477489 0.64715023] mean value: 0.757022267046191 key: train_mcc value: [0.79982796 0.79935063 0.81028734 0.80841289 0.795357 0.79725623 0.79748273 0.7865219 0.79388846 0.80124844] mean value: 0.7989633567898495 key: test_accuracy value: [0.89189189 0.87387387 0.83783784 0.9009009 0.9009009 0.87387387 0.91891892 0.87387387 0.88181818 0.81818182] mean value: 0.8772072072072072 key: train_accuracy value: [0.89869609 0.89869609 0.90371113 0.90270812 0.89669007 0.89769308 0.89769308 0.89267803 0.89579158 0.8997996 ] mean value: 0.8984156879456003 key: test_fscore value: [0.89090909 0.87931034 0.83928571 0.90265487 0.90598291 0.87931034 0.92173913 0.87931034 0.88495575 0.83333333] mean value: 0.8816791828897612 key: train_fscore value: [0.90260366 0.90222652 0.90769231 0.90682037 0.90009699 0.90097087 0.90116279 0.89540567 0.8996139 0.90291262] mean value: 0.9019505710094796 key: test_precision value: [0.89090909 0.83606557 0.8245614 0.87931034 0.86885246 0.85 0.89830508 0.85 0.86206897 0.76923077] mean value: 0.8529303691526108 key: train_precision value: [0.86988848 0.87265918 0.87245841 0.87084871 0.87054409 0.87218045 0.87078652 0.87238095 0.86778399 0.87570621] mean value: 0.8715236980915356 key: test_recall value: [0.89090909 0.92727273 0.85454545 0.92727273 0.94642857 0.91071429 0.94642857 0.91071429 0.90909091 0.90909091] mean value: 0.9132467532467532 key: train_recall value: [0.93787575 0.93386774 0.94589178 0.94589178 0.93172691 0.93172691 0.93373494 0.91967871 0.93386774 0.93186373] mean value: 0.9346125986913586 key: test_roc_auc value: [0.89188312 0.87435065 0.83798701 0.90113636 0.90048701 0.87353896 0.91866883 0.87353896 0.88181818 0.81818182] mean value: 0.8771590909090909 key: train_roc_auc value: [0.89865675 0.89866078 0.90366878 0.90266477 0.89672518 0.89772718 0.89772919 0.89270509 0.89579158 0.8997996 ] mean value: 0.8984128900371023 key: test_jcc value: [0.80327869 0.78461538 0.72307692 0.82258065 0.828125 0.78461538 0.85483871 0.78461538 0.79365079 0.71428571] mean value: 0.7893682628222884 key: train_jcc value: [0.82249561 0.82186949 0.83098592 0.82952548 0.81834215 0.81978799 0.82010582 0.81061947 0.81754386 0.82300885] mean value: 0.8214284629540267 MCC on Blind test: 0.59 Accuracy on Blind test: 0.82 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:196: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_cd_7030.py:199: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.35478425 0.34318781 0.25689292 0.37095976 0.29249549 0.43200922 0.29593349 0.31350493 0.3035562 0.22859263] mean value: 0.3191916704177856 key: score_time value: [0.01961803 0.01251912 0.01954269 0.01953173 0.01246142 0.01285362 0.01274061 0.02135682 0.024894 0.0129559 ] mean value: 0.016847395896911622 key: test_mcc value: [0.78376623 0.75237443 0.67762003 0.7137294 0.82182846 0.74951538 0.83897362 0.74951538 0.76477489 0.64715023] mean value: 0.7499248050806804 key: train_mcc value: [0.79982796 0.79935063 0.82538044 0.81028734 0.7968424 0.79725623 0.79748273 0.7865219 0.79388846 0.80124844] mean value: 0.800808652175234 key: test_accuracy value: [0.89189189 0.87387387 0.83783784 0.85585586 0.90990991 0.87387387 0.91891892 0.87387387 0.88181818 0.81818182] mean value: 0.8736036036036036 key: train_accuracy value: [0.89869609 0.89869609 0.91173521 0.90371113 0.89769308 0.89769308 0.89769308 0.89267803 0.89579158 0.8997996 ] mean value: 0.8994186969726816 key: test_fscore value: [0.89090909 0.87931034 0.84210526 0.85964912 0.9137931 0.87931034 0.92173913 0.87931034 0.88495575 0.83333333] mean value: 0.8784415830785542 key: train_fscore value: [0.90260366 0.90222652 0.91472868 0.90769231 0.9005848 0.90097087 0.90116279 0.89540567 0.8996139 0.90291262] mean value: 0.9027901829342879 key: test_precision value: [0.89090909 0.83606557 0.81355932 0.83050847 0.88333333 0.85 0.89830508 0.85 0.86206897 0.76923077] mean value: 0.8483980614116859 key: train_precision value: [0.86988848 0.87265918 0.88555347 0.87245841 0.875 0.87218045 0.87078652 0.87238095 0.86778399 0.87570621] mean value: 0.873439765329131 key: test_recall value: [0.89090909 0.92727273 0.87272727 0.89090909 0.94642857 0.91071429 0.94642857 0.91071429 0.90909091 0.90909091] mean value: 0.9114285714285714 key: train_recall value: [0.93787575 0.93386774 0.94589178 0.94589178 0.92771084 0.93172691 0.93373494 0.91967871 0.93386774 0.93186373] mean value: 0.9342109922656558 key: test_roc_auc value: [0.89188312 0.87435065 0.83814935 0.85616883 0.90957792 0.87353896 0.91866883 0.87353896 0.88181818 0.81818182] mean value: 0.8735876623376624 key: train_roc_auc value: [0.89865675 0.89866078 0.91170091 0.90366878 0.89772316 0.89772718 0.89772919 0.89270509 0.89579158 0.8997996 ] mean value: 0.899416302484487 key: test_jcc value: [0.80327869 0.78461538 0.72727273 0.75384615 0.84126984 0.78461538 0.85483871 0.78461538 0.79365079 0.71428571] mean value: 0.7842288782373393 key: train_jcc value: [0.82249561 0.82186949 0.84285714 0.83098592 0.81914894 0.81978799 0.82010582 0.81061947 0.81754386 0.82300885] mean value: 0.8228423073588096 MCC on Blind test: 0.59 Accuracy on Blind test: 0.82