/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_rt.py:550: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 1133 PASS: my_features_df and aa_df successfully combined nrows: 1133 ncols: 274 count of NULL values before imputation or_mychisq 339 log10_or_mychisq 339 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 169 No. of categorical features: 7 index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: True index: 2 ind: 3 Mask count check: True Original Data Counter({0: 545, 1: 30}) Data dim: (575, 176) ------------------------------------------------------------- Successfully split data: REVERSE training imputed values: training set actual values: blind test set Train data size: (575, 176) Test data size: (557, 176) y_train numbers: Counter({0: 545, 1: 30}) y_train ratio: 18.166666666666668 y_test_numbers: Counter({0: 282, 1: 275}) y_test ratio: 1.0254545454545454 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 545, 1: 545}) (1090, 176) Simple Random UnderSampling Counter({0: 30, 1: 30}) (60, 176) Simple Combined Over and UnderSampling Counter({0: 545, 1: 545}) (1090, 176) SMOTE_NC OverSampling Counter({0: 545, 1: 545}) (1090, 176) ##################################################################### Running ML analysis: REVERSE training Gene name: rpoB Drug name: rifampicin Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_rt/ Sanity checks: Total input features: 176 Training data size: (575, 176) Test data size: (557, 176) Target feature numbers (training data): Counter({0: 545, 1: 30}) Target features ratio (training data: 18.166666666666668 Target feature numbers (test data): Counter({0: 282, 1: 275}) Target features ratio (test data): 1.0254545454545454 ##################################################################### ================================================================ Strucutral features (n): 37 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03620291 0.03169894 0.03200722 0.03302526 0.03463435 0.03355765 0.03698945 0.03275728 0.03071117 0.02976274] mean value: 0.03313469886779785 key: score_time value: [0.01243091 0.0119307 0.01226759 0.01270723 0.01222587 0.01198983 0.01438379 0.01209283 0.01194143 0.01193285] mean value: 0.01239030361175537 key: test_mcc value: [0.64848485 0.56713087 0.56713087 0. 0.38251843 0.56694671 0.2962963 0.38204659 0.80903983 0.64814815] mean value: 0.4867742597878121 key: train_mcc value: [0.78551705 0.78551705 0.78589709 0.8321053 0.80909988 0.78553305 0.76169209 0.80911474 0.78553305 0.80911474] mean value: 0.7949124038845846 key: test_accuracy value: [0.96551724 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228 0.92982456 0.94736842 0.98245614 0.96491228] mean value: 0.9582577132486388 key: train_accuracy value: [0.98065764 0.98065764 0.98065764 0.98452611 0.98259188 0.98069498 0.97876448 0.98262548 0.98069498 0.98262548] mean value: 0.9814496314496315 key: test_fscore value: [0.66666667 0.5 0.5 0. 0.4 0.5 0.33333333 0.4 0.8 0.66666667] mean value: 0.4766666666666667 key: train_fscore value: [0.77272727 0.77272727 0.7826087 0.82608696 0.8 0.77272727 0.75555556 0.8 0.77272727 0.8 ] mean value: 0.7855160298638559 key: test_precision value: [0.66666667 1. 1. 0. 0.5 1. 0.33333333 0.5 1. 0.66666667] mean value: 0.6666666666666666 key: train_precision value: [1. 1. 0.94736842 1. 1. 1. 0.94444444 1. 1. 1. ] mean value: 0.9891812865497076 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0.33333333 0.33333333 0.33333333 0.33333333 0.66666667 0.66666667] mean value: 0.39999999999999997 key: train_recall value: [0.62962963 0.62962963 0.66666667 0.7037037 0.66666667 0.62962963 0.62962963 0.66666667 0.62962963 0.66666667] mean value: 0.6518518518518519 key: test_roc_auc value: [0.82424242 0.66666667 0.66666667 0.5 0.65757576 0.66666667 0.64814815 0.65740741 0.83333333 0.82407407] mean value: 0.6944781144781145 key: train_roc_auc value: [0.81481481 0.81481481 0.83231293 0.85185185 0.83333333 0.81481481 0.81379648 0.83333333 0.81481481 0.83333333] mean value: 0.8257220521157094 key: test_jcc value: [0.5 0.33333333 0.33333333 0. 0.25 0.33333333 0.2 0.25 0.66666667 0.5 ] mean value: 0.33666666666666667 key: train_jcc value: [0.62962963 0.62962963 0.64285714 0.7037037 0.66666667 0.62962963 0.60714286 0.66666667 0.62962963 0.66666667] mean value: 0.6472222222222223 MCC on Blind test: 0.2 Accuracy on Blind test: 0.55 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.88869524 0.74637437 0.73213053 0.80213022 0.75085831 0.79953766 0.74605346 0.7229135 0.95076203 0.73621321] mean value: 0.78756685256958 key: score_time value: [0.01215458 0.01214933 0.01236701 0.01245666 0.01223445 0.01911211 0.01208758 0.01212478 0.01212263 0.01623821] mean value: 0.013304734230041504 key: test_mcc value: [ 0.80917359 0.56713087 0.56713087 0. -0.03093441 0.56694671 0.2962963 0.38204659 0.80903983 0.2962963 ] mean value: 0.4263126653890895 key: train_mcc value: [0.5982906 0.62811222 0.89810042 0.65669104 0.68418345 0.68420281 0.65717642 0.62813245 0.62813245 0.98030899] mean value: 0.7043330842580746 key: test_accuracy value: [0.98275862 0.96551724 0.96551724 0.94827586 0.93103448 0.96491228 0.92982456 0.94736842 0.98245614 0.92982456] mean value: 0.9547489413188143 key: train_accuracy value: [0.96711799 0.96905222 0.99032882 0.97098646 0.9729207 0.97297297 0.97104247 0.96911197 0.96911197 0.9980695 ] mean value: 0.9750715069864005 key: test_fscore value: [0.8 0.5 0.5 0. 0. 0.5 0.33333333 0.4 0.8 0.33333333] mean value: 0.4166666666666667 key: train_fscore value: [0.54054054 0.57894737 0.89795918 0.61538462 0.65 0.65 0.63414634 0.57894737 0.57894737 0.98113208] mean value: 0.6706004861796896 key: test_precision value: [1. 1. 1. 0. 0. 1. 0.33333333 0.5 1. 0.33333333] mean value: 0.6166666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 0.92857143 1. 1. 1. ] mean value: 0.9928571428571429 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0.33333333 0.33333333 0.66666667 0.33333333] mean value: 0.3333333333333333 key: train_recall value: [0.37037037 0.40740741 0.81481481 0.44444444 0.48148148 0.48148148 0.48148148 0.40740741 0.40740741 0.96296296] mean value: 0.5259259259259259 key: test_roc_auc value: [0.83333333 0.66666667 0.66666667 0.5 0.49090909 0.66666667 0.64814815 0.65740741 0.83333333 0.64814815] mean value: 0.6611279461279461 key: train_roc_auc value: [0.68518519 0.7037037 0.90740741 0.72222222 0.74074074 0.74074074 0.73972241 0.7037037 0.7037037 0.98148148] mean value: 0.762861129969073 key: test_jcc value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0.2 0.25 0.66666667 0.2 ] mean value: 0.29833333333333334 key: train_jcc value: [0.37037037 0.40740741 0.81481481 0.44444444 0.48148148 0.48148148 0.46428571 0.40740741 0.40740741 0.96296296] mean value: 0.5242063492063492 MCC on Blind test: 0.17 Accuracy on Blind test: 0.54 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01631737 0.01177216 0.01084805 0.01043868 0.0100987 0.0101819 0.01021171 0.01049137 0.01006961 0.01019645] mean value: 0.01106259822845459 key: score_time value: [0.01252532 0.00943542 0.00935078 0.00916195 0.00890803 0.00936007 0.00899339 0.00960541 0.00887513 0.00927067] mean value: 0.009548616409301759 key: test_mcc value: [0.43452409 0.51168172 0.1762953 0.28417826 0.45726459 0.28291258 0.51099032 0.30441977 0.68718427 0.51099032] mean value: 0.41604412240331595 key: train_mcc value: [0.42281221 0.43803842 0.60914572 0.45594583 0.46618385 0.43552025 0.44078566 0.44401007 0.42989307 0.49523539] mean value: 0.4637570449954174 key: test_accuracy value: [0.82758621 0.87931034 0.87931034 0.82758621 0.84482759 0.8245614 0.87719298 0.84210526 0.94736842 0.87719298] mean value: 0.8627041742286751 key: train_accuracy value: [0.82978723 0.84139265 0.92843327 0.86460348 0.86073501 0.83976834 0.84362934 0.85714286 0.84749035 0.87837838] mean value: 0.8591360910509847 key: test_fscore value: [0.375 0.46153846 0.22222222 0.28571429 0.4 0.28571429 0.46153846 0.30769231 0.66666667 0.46153846] mean value: 0.39276251526251527 key: train_fscore value: [0.37142857 0.3880597 0.58426966 0.41666667 0.41935484 0.38518519 0.39097744 0.40322581 0.3875969 0.45217391] mean value: 0.4198938688732906 key: test_precision value: [0.23076923 0.3 0.16666667 0.18181818 0.25 0.18181818 0.3 0.2 0.5 0.3 ] mean value: 0.2611072261072261 key: train_precision value: [0.2300885 0.24299065 0.41935484 0.2688172 0.26804124 0.24074074 0.24528302 0.25773196 0.24509804 0.29545455] mean value: 0.2713600732946767 key: test_recall value: [1. 1. 0.33333333 0.66666667 1. 0.66666667 1. 0.66666667 1. 1. ] mean value: 0.8333333333333334 key: train_recall value: [0.96296296 0.96296296 0.96296296 0.92592593 0.96296296 0.96296296 0.96296296 0.92592593 0.92592593 0.96296296] mean value: 0.9518518518518518 key: test_roc_auc value: [0.90909091 0.93636364 0.62121212 0.75151515 0.91818182 0.75 0.93518519 0.75925926 0.97222222 0.93518519] mean value: 0.8488215488215488 key: train_roc_auc value: [0.89270597 0.89882842 0.94474679 0.89357521 0.9090325 0.89797843 0.90001509 0.88964321 0.88455156 0.91834503] mean value: 0.9029422192049483 key: test_jcc value: [0.23076923 0.3 0.125 0.16666667 0.25 0.16666667 0.3 0.18181818 0.5 0.3 ] mean value: 0.2520920745920746 key: train_jcc value: [0.22807018 0.24074074 0.41269841 0.26315789 0.26530612 0.23853211 0.24299065 0.25252525 0.24038462 0.29213483] mean value: 0.2676540809731464 MCC on Blind test: 0.43 Accuracy on Blind test: 0.69 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01049566 0.01031566 0.01039696 0.01031137 0.01143765 0.01037836 0.0102942 0.01034856 0.01027584 0.01027036] mean value: 0.010452461242675782 key: score_time value: [0.00897431 0.00893593 0.00913882 0.00890827 0.00900865 0.00888872 0.00890207 0.00890517 0.0088551 0.00887251] mean value: 0.008938956260681152 key: test_mcc value: [ 0.5508895 0.56713087 0.38251843 -0.04413674 -0.05454545 0.38204659 -0.03149704 0.38204659 0.76011695 0.80903983] mean value: 0.37036095219850856 key: train_mcc value: [0.58249009 0.5427084 0.62227177 0.55500996 0.60935992 0.62292056 0.58514128 0.58191044 0.58514128 0.5955795 ] mean value: 0.5882533192252368 key: test_accuracy value: [0.94827586 0.96551724 0.94827586 0.9137931 0.89655172 0.94736842 0.92982456 0.94736842 0.96491228 0.98245614] mean value: 0.9444343617664852 key: train_accuracy value: [0.95938104 0.95551257 0.96324952 0.95744681 0.96324952 0.96138996 0.95752896 0.96138996 0.95752896 0.96138996] mean value: 0.9598067257641726 key: test_fscore value: [0.57142857 0.5 0.4 0. 0. 0.4 0. 0.4 0.75 0.8 ] mean value: 0.3821428571428572 key: train_fscore /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) value: [0.60377358 0.56603774 0.64150943 0.57692308 0.62745098 0.64285714 0.60714286 0.6 0.60714286 0.61538462] mean value: 0.6088222284559688 key: test_precision value: [0.5 1. 0.5 0. 0. 0.5 0. 0.5 0.6 1. ] mean value: 0.46 key: train_precision value: [0.61538462 0.57692308 0.65384615 0.6 0.66666667 0.62068966 0.5862069 0.65217391 0.5862069 0.64 ] mean value: 0.6198097874139853 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0. 0.33333333 1. 0.66666667] mean value: 0.36666666666666664 key: train_recall value: [0.59259259 0.55555556 0.62962963 0.55555556 0.59259259 0.66666667 0.62962963 0.55555556 0.62962963 0.59259259] mean value: 0.6 key: test_roc_auc value: [0.81515152 0.66666667 0.65757576 0.48181818 0.47272727 0.65740741 0.49074074 0.65740741 0.98148148 0.83333333] mean value: 0.6714309764309764 key: train_roc_auc value: [0.78609221 0.76655329 0.80563114 0.7675737 0.78813303 0.8221317 0.80259486 0.76963114 0.80259486 0.78713133] mean value: 0.7898067251340455 key: test_jcc value: [0.4 0.33333333 0.25 0. 0. 0.25 0. 0.25 0.6 0.66666667] mean value: 0.275 key: train_jcc value: [0.43243243 0.39473684 0.47222222 0.40540541 0.45714286 0.47368421 0.43589744 0.42857143 0.43589744 0.44444444] mean value: 0.4380434714645241 MCC on Blind test: 0.2 Accuracy on Blind test: 0.56 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00958323 0.01032209 0.01076818 0.01059055 0.01070786 0.0109334 0.00970054 0.0109117 0.01076508 0.00970888] mean value: 0.010399150848388671 key: score_time value: [0.07445121 0.01275253 0.0135901 0.0133009 0.01288939 0.01299 0.01248527 0.01255918 0.01344275 0.01351595] mean value: 0.019197726249694826 key: test_mcc value: [0.80917359 0.56713087 0.56713087 0. 0. 0. 0. 0.56694671 0.56694671 0. ] mean value: 0.30773287583715697 key: train_mcc value: [0.4990914 0.46161651 0.4990914 0.5982906 0.56702937 0.59831103 0.59831103 0.53409532 0.53409532 0.53409532] mean value: 0.5424027293731967 key: test_accuracy value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.94736842 0.94736842 0.96491228 0.96491228 0.94736842] mean value: 0.958227465214761 key: train_accuracy value: [0.96131528 0.95938104 0.96131528 0.96711799 0.96518375 0.96718147 0.96718147 0.96332046 0.96332046 0.96332046] mean value: 0.9638637670552564 key: test_fscore value: [0.8 0.5 0.5 0. 0. 0. 0. 0.5 0.5 0. ] mean value: 0.28 key: train_fscore value: [0.41176471 0.36363636 0.41176471 0.54054054 0.5 0.54054054 0.54054054 0.45714286 0.45714286 0.45714286] mean value: 0.46802159684512623 key: test_precision value: [1. 1. 1. 0. 0. 0. 0. 1. 1. 0.] mean value: 0.5 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0. 0. 0. 0.33333333 0.33333333 0. ] mean value: 0.19999999999999998 key: train_recall value: [0.25925926 0.22222222 0.25925926 0.37037037 0.33333333 0.37037037 0.37037037 0.2962963 0.2962963 0.2962963 ] mean value: 0.3074074074074074 key: test_roc_auc value: [0.83333333 0.66666667 0.66666667 0.5 0.5 0.5 0.5 0.66666667 0.66666667 0.5 ] mean value: 0.6 key: train_roc_auc value: [0.62962963 0.61111111 0.62962963 0.68518519 0.66666667 0.68518519 0.68518519 0.64814815 0.64814815 0.64814815] mean value: 0.6537037037037037 key: test_jcc value: [0.66666667 0.33333333 0.33333333 0. 0. 0. 0. 0.33333333 0.33333333 0. ] mean value: 0.19999999999999998 key: train_jcc value: [0.25925926 0.22222222 0.25925926 0.37037037 0.33333333 0.37037037 0.37037037 0.2962963 0.2962963 0.2962963 ] mean value: 0.3074074074074074 MCC on Blind test: 0.12 Accuracy on Blind test: 0.52 Model_name: SVM Model func: SVC(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01754379 0.01406407 0.01333332 0.01329112 0.01354933 0.0141921 0.01471233 0.01526451 0.01424551 0.0159626 ] mean value: 0.014615869522094727 key: score_time value: [0.01249647 0.00990486 0.0098834 0.00987887 0.0101161 0.00993848 0.00975871 0.01059246 0.0107305 0.0111804 ] mean value: 0.010448026657104491 key: test_mcc value: [ 0. 0.56713087 0.56713087 0. 0. 0. -0.03149704 0.56694671 0. 0.56694671] mean value: 0.22366581252414472 key: train_mcc value: [0.53407501 0.53407501 0.56702937 0.5982906 0.65669104 0.53409532 0.62813245 0.62813245 0.53409532 0.53409532] mean value: 0.5748711879647856 key: test_accuracy value: [0.94827586 0.96551724 0.96551724 0.94827586 0.94827586 0.94736842 0.92982456 0.96491228 0.94736842 0.96491228] mean value: 0.9530248033877798 key: train_accuracy value: [0.96324952 0.96324952 0.96518375 0.96711799 0.97098646 0.96332046 0.96911197 0.96911197 0.96332046 0.96332046] mean value: 0.9657972562227881 key: test_fscore value: [0. 0.5 0.5 0. 0. 0. 0. 0.5 0. 0.5] mean value: 0.2 key: train_fscore value: [0.45714286 0.45714286 0.5 0.54054054 0.61538462 0.45714286 0.57894737 0.57894737 0.45714286 0.45714286] mean value: 0.5099534178481546 key: test_precision value: [0. 1. 1. 0. 0. 0. 0. 1. 0. 1.] mean value: 0.4 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0. 0.33333333 0.33333333 0. 0. 0. 0. 0.33333333 0. 0.33333333] mean value: 0.13333333333333333 key: train_recall value: [0.2962963 0.2962963 0.33333333 0.37037037 0.44444444 0.2962963 0.40740741 0.40740741 0.2962963 0.2962963 ] mean value: 0.34444444444444444 key: test_roc_auc value: [0.5 0.66666667 0.66666667 0.5 0.5 0.5 0.49074074 0.66666667 0.5 0.66666667] mean value: 0.5657407407407408 key: train_roc_auc value: [0.64814815 0.64814815 0.66666667 0.68518519 0.72222222 0.64814815 0.7037037 0.7037037 0.64814815 0.64814815] mean value: 0.6722222222222222 key: test_jcc value: [0. 0.33333333 0.33333333 0. 0. 0. 0. 0.33333333 0. 0.33333333] mean value: 0.13333333333333333 key: train_jcc value: [0.2962963 0.2962963 0.33333333 0.37037037 0.44444444 0.2962963 0.40740741 0.40740741 0.2962963 0.2962963 ] mean value: 0.34444444444444444 MCC on Blind test: 0.11 Accuracy on Blind test: 0.52 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.86374784 2.09029508 1.63843369 2.03394628 1.90827298 1.78310966 1.99839234 2.06304955 1.93170047 1.81367326] mean value: 1.9124621152877808 key: score_time value: [0.01261926 0.01259279 0.01266718 0.01298261 0.01273131 0.01226902 0.01244545 0.01506019 0.01257229 0.01240182] mean value: 0.01283419132232666 key: test_mcc value: [0.76038268 0.5508895 0.38251843 0. 0.2969697 0.56694671 0.68718427 0.38204659 0.80903983 0.38204659] mean value: 0.4818024290736844 key: train_mcc value: [1. 1. 1. 1. 1. 0.9609263 0.98030899 1. 0.9609263 1. ] mean value: 0.990216159887024 key: test_accuracy value: [0.96551724 0.94827586 0.94827586 0.94827586 0.93103448 0.96491228 0.94736842 0.94736842 0.98245614 0.94736842] mean value: 0.9530852994555353 key: train_accuracy value: [1. 1. 1. 1. 1. 0.996139 0.9980695 1. 0.996139 1. ] mean value: 0.9990347490347491 key: test_fscore value: [0.75 0.57142857 0.4 0. 0.33333333 0.5 0.66666667 0.4 0.8 0.4 ] mean value: 0.48214285714285715 key: train_fscore value: [1. 1. 1. 1. 1. 0.96296296 0.98113208 1. 0.96296296 1. ] mean value: 0.9907058001397624 key: test_precision value: [0.6 0.5 0.5 0. 0.33333333 1. 0.5 0.5 1. 0.5 ] mean value: 0.5433333333333333 key: train_precision value: [1. 1. 1. 1. 1. 0.96296296 1. 1. 0.96296296 1. ] mean value: 0.9925925925925926 key: test_recall value: [1. 0.66666667 0.33333333 0. 0.33333333 0.33333333 1. 0.33333333 0.66666667 0.33333333] mean value: 0.5 key: train_recall value: [1. 1. 1. 1. 1. 0.96296296 0.96296296 1. 0.96296296 1. ] mean value: 0.9888888888888889 key: test_roc_auc value: [0.98181818 0.81515152 0.65757576 0.5 0.64848485 0.66666667 0.97222222 0.65740741 0.83333333 0.65740741] mean value: 0.739006734006734 key: train_roc_auc value: [1. 1. 1. 1. 1. 0.98046315 0.98148148 1. 0.98046315 1. ] mean value: 0.9942407784566644 key: test_jcc value: [0.6 0.4 0.25 0. 0.2 0.33333333 0.5 0.25 0.66666667 0.25 ] mean value: 0.345 key: train_jcc value: [1. 1. 1. 1. 1. 0.92857143 0.96296296 1. 0.92857143 1. ] mean value: 0.982010582010582 MCC on Blind test: 0.24 Accuracy on Blind test: 0.57 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02618551 0.02589345 0.02173519 0.02467251 0.02685261 0.02534461 0.0224154 0.02488923 0.02453709 0.01868343] mean value: 0.024120903015136717 key: score_time value: [0.01264358 0.00964165 0.00909233 0.00906181 0.00916147 0.00893998 0.00907493 0.00904655 0.00892591 0.00914407] mean value: 0.009473228454589843 key: test_mcc value: [0.5508895 0.64848485 0.56713087 0.56713087 0.2969697 0.76011695 0.55039532 0.64814815 0.80903983 0.2962963 ] mean value: 0.5694602338563187 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94827586 0.96551724 0.96551724 0.96551724 0.93103448 0.96491228 0.94736842 0.96491228 0.98245614 0.92982456] mean value: 0.9565335753176043 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.57142857 0.66666667 0.5 0.5 0.33333333 0.75 0.57142857 0.66666667 0.8 0.33333333] mean value: 0.5692857142857143 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.66666667 1. 1. 0.33333333 0.6 0.5 0.66666667 1. 0.33333333] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) mean value: 0.66 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.66666667 0.33333333 0.33333333 0.33333333 1. 0.66666667 0.66666667 0.66666667 0.33333333] mean value: 0.5666666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.81515152 0.82424242 0.66666667 0.66666667 0.64848485 0.98148148 0.81481481 0.82407407 0.83333333 0.64814815] mean value: 0.7723063973063973 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.4 0.5 0.33333333 0.33333333 0.2 0.6 0.4 0.5 0.66666667 0.2 ] mean value: 0.41333333333333333 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.25 Accuracy on Blind test: 0.58 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10347772 0.10542083 0.1026392 0.10236788 0.1086688 0.10966682 0.10839295 0.10804629 0.10971022 0.10941029] mean value: 0.10678009986877442 key: score_time value: [0.01768446 0.01904488 0.01783371 0.01776528 0.01919889 0.01932764 0.01858878 0.0196681 0.01852608 0.01737475] mean value: 0.01850125789642334 key: test_mcc value: [0.80917359 0.56713087 0.56713087 0. 0.56713087 0.56694671 0.38204659 0.38204659 0.80903983 0.64814815] mean value: 0.5298794082235709 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98275862 0.96551724 0.96551724 0.94827586 0.96551724 0.96491228 0.94736842 0.94736842 0.98245614 0.96491228] mean value: 0.9634603750756201 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.5 0.5 0. 0.5 0.5 0.4 0.4 0.8 0.66666667] mean value: 0.5066666666666667 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 0. 1. 1. 0.5 0.5 1. 0.66666667] mean value: 0.7666666666666666 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0.33333333 0.33333333 0.33333333 0.33333333 0.66666667 0.66666667] mean value: 0.39999999999999997 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.66666667 0.66666667 0.5 0.66666667 0.66666667 0.65740741 0.65740741 0.83333333 0.82407407] mean value: 0.6972222222222222 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.33333333 0.33333333 0. 0.33333333 0.33333333 0.25 0.25 0.66666667 0.5 ] mean value: 0.36666666666666664 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.54 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01089549 0.00997782 0.01006174 0.00988054 0.01028514 0.00988078 0.01003456 0.00997853 0.01007175 0.01044822] mean value: 0.01015145778656006 key: score_time value: [0.0088129 0.0086453 0.00865483 0.00858808 0.00892234 0.00854993 0.00863981 0.00871801 0.00876451 0.00892496] mean value: 0.008722066879272461 key: test_mcc value: [ 0.48301038 0.38251843 0.2969697 -0.04413674 -0.03093441 0.56694671 0.20464687 0.24282147 0. 0.17516462] mean value: 0.22770070155826885 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93103448 0.94827586 0.93103448 0.9137931 0.93103448 0.96491228 0.89473684 0.9122807 0.94736842 0.87719298] mean value: 0.9251663641863279 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.5 0.4 0.33333333 0. 0. 0.5 0.25 0.28571429 0. 0.22222222] mean value: 0.24912698412698414 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.4 0.5 0.33333333 0. 0. 1. 0.2 0.25 0. 0.16666667] mean value: 0.285 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0.33333333 0.33333333 0. 0.33333333] mean value: 0.26666666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.80606061 0.65757576 0.64848485 0.48181818 0.49090909 0.66666667 0.62962963 0.63888889 0.5 0.62037037] mean value: 0.614040404040404 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.33333333 0.25 0.2 0. 0. 0.33333333 0.14285714 0.16666667 0. 0.125 ] mean value: 0.1551190476190476 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.22 Accuracy on Blind test: 0.57 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.54438972 1.56727147 1.54567933 1.55488396 1.54870415 1.54974079 1.54354572 1.55656648 1.52714944 1.55768704] mean value: 1.5495618104934692 key: score_time value: [0.0920825 0.08973002 0.08945179 0.08959389 0.08978343 0.08988523 0.08968854 0.08932042 0.09021282 0.08947134] mean value: 0.08992199897766114 key: test_mcc value: [0.80917359 0.56713087 0.56713087 0. 0. 0.56694671 0.38204659 0.38204659 0.56694671 0.80903983] mean value: 0.46504617707858015 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228 0.94736842 0.94736842 0.96491228 0.98245614] mean value: 0.9617362371445856 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.5 0.5 0. 0. 0.5 0.4 0.4 0.5 0.8] mean value: 0.44 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 0. 0. 1. 0.5 0.5 1. 1. ] mean value: 0.7 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0.33333333 0.33333333 0.33333333 0.66666667] mean value: 0.3333333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.66666667 0.66666667 0.5 0.5 0.66666667 0.65740741 0.65740741 0.66666667 0.83333333] mean value: 0.6648148148148147 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0.25 0.25 0.33333333 0.66666667] mean value: 0.31666666666666665 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.54 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.79947305 0.99364781 0.97343969 1.00802183 0.97355986 0.97136927 0.97683573 0.94764853 0.94387889 0.96559119] mean value: 1.0553465843200684 key: score_time value: [0.22360277 0.19479609 0.35481954 0.11889696 0.24737167 0.24671912 0.25966692 0.24125576 0.23207593 0.26727366] mean value: 0.23864784240722656 key: test_mcc value: [0.80917359 0.56713087 0.56713087 0. 0. 0.56694671 0.38204659 0. 0. 0.56694671] mean value: 0.34593753471007405 key: train_mcc value: [0.73639347 0.65669104 0.76130255 0.71071615 0.73639347 0.68420281 0.65671091 0.68420281 0.68420281 0.76131958] mean value: 0.7072135583787191 key: test_accuracy value: [0.98275862 0.96551724 0.96551724 0.94827586 0.94827586 0.96491228 0.94736842 0.94736842 0.94736842 0.96491228] mean value: 0.958227465214761 key: train_accuracy value: [0.97678917 0.97098646 0.9787234 0.97485493 0.97678917 0.97297297 0.97104247 0.97297297 0.97297297 0.97876448] mean value: 0.9746869002188151 key: test_fscore value: [0.8 0.5 0.5 0. 0. 0.5 0.4 0. 0. 0.5] mean value: 0.32 key: train_fscore value: [0.71428571 0.61538462 0.74418605 0.68292683 0.71428571 0.65 0.61538462 0.65 0.65 0.74418605] mean value: 0.6780639581632207 key: test_precision value: [1. 1. 1. 0. 0. 1. 0.5 0. 0. 1. ] mean value: 0.55 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0.33333333 0. 0. 0.33333333] mean value: 0.2333333333333333 key: train_recall value: [0.55555556 0.44444444 0.59259259 0.51851852 0.55555556 0.48148148 0.44444444 0.48148148 0.48148148 0.59259259] mean value: 0.5148148148148148 key: test_roc_auc value: [0.83333333 0.66666667 0.66666667 0.5 0.5 0.66666667 0.65740741 0.5 0.5 0.66666667] mean value: 0.6157407407407407 key: train_roc_auc value: [0.77777778 0.72222222 0.7962963 0.75925926 0.77777778 0.74074074 0.72222222 0.74074074 0.74074074 0.7962963 ] mean value: 0.7574074074074074 key: test_jcc value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0.25 0. 0. 0.33333333] mean value: 0.22499999999999998 key: train_jcc value: [0.55555556 0.44444444 0.59259259 0.51851852 0.55555556 0.48148148 0.44444444 0.48148148 0.48148148 0.59259259] mean value: 0.5148148148148148 MCC on Blind test: 0.13 Accuracy on Blind test: 0.52 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02531791 0.01029682 0.01034307 0.01032424 0.01029944 0.01025152 0.01029849 0.01025391 0.01028061 0.01024294] mean value: 0.011790895462036132 key: score_time value: [0.01097441 0.008816 0.00896859 0.00880456 0.00884557 0.00885034 0.00882292 0.00883365 0.00881696 0.00878763] mean value: 0.009052062034606933 key: test_mcc value: [ 0.5508895 0.56713087 0.38251843 -0.04413674 -0.05454545 0.38204659 -0.03149704 0.38204659 0.76011695 0.80903983] mean value: 0.37036095219850856 key: train_mcc value: [0.58249009 0.5427084 0.62227177 0.55500996 0.60935992 0.62292056 0.58514128 0.58191044 0.58514128 0.5955795 ] mean value: 0.5882533192252368 key: test_accuracy value: [0.94827586 0.96551724 0.94827586 0.9137931 0.89655172 0.94736842 0.92982456 0.94736842 0.96491228 0.98245614] mean value: 0.9444343617664852 key: train_accuracy value: [0.95938104 0.95551257 0.96324952 0.95744681 0.96324952 0.96138996 0.95752896 0.96138996 0.95752896 0.96138996] mean value: 0.9598067257641726 key: test_fscore value: [0.57142857 0.5 0.4 0. 0. 0.4 0. 0.4 0.75 0.8 ] mean value: 0.3821428571428572 key: train_fscore value: [0.60377358 0.56603774 0.64150943 0.57692308 0.62745098 0.64285714 0.60714286 0.6 0.60714286 0.61538462] mean value: 0.6088222284559688 key: test_precision value: [0.5 1. 0.5 0. 0. 0.5 0. 0.5 0.6 1. ] mean value: 0.46 key: train_precision value: [0.61538462 0.57692308 0.65384615 0.6 0.66666667 0.62068966 0.5862069 0.65217391 0.5862069 0.64 ] mean value: 0.6198097874139853 key: test_recall value: [0.66666667 0.33333333 0.33333333 0. 0. 0.33333333 0. 0.33333333 1. 0.66666667] mean value: 0.36666666666666664 key: train_recall value: [0.59259259 0.55555556 0.62962963 0.55555556 0.59259259 0.66666667 0.62962963 0.55555556 0.62962963 0.59259259] mean value: 0.6 key: test_roc_auc value: [0.81515152 0.66666667 0.65757576 0.48181818 0.47272727 0.65740741 0.49074074 0.65740741 0.98148148 0.83333333] mean value: 0.6714309764309764 key: train_roc_auc value: [0.78609221 0.76655329 0.80563114 0.7675737 0.78813303 0.8221317 0.80259486 0.76963114 0.80259486 0.78713133] mean value: 0.7898067251340455 key: test_jcc value: [0.4 0.33333333 0.25 0. 0. 0.25 0. 0.25 0.6 0.66666667] mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) 0.275 key: train_jcc value: [0.43243243 0.39473684 0.47222222 0.40540541 0.45714286 0.47368421 0.43589744 0.42857143 0.43589744 0.44444444] mean value: 0.4380434714645241 MCC on Blind test: 0.2 Accuracy on Blind test: 0.56 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.10296011 0.07609224 0.06126785 0.07784891 0.08386159 0.0641036 0.06421399 0.07174659 0.06452918 0.06336784] mean value: 0.07299919128417968 key: score_time value: [0.01094532 0.01224589 0.01145506 0.01107168 0.01102376 0.01049256 0.01052618 0.01079702 0.01050138 0.01053238] mean value: 0.010959124565124512 key: test_mcc value: [ 0.64848485 0.80917359 0.56713087 0. -0.03093441 0.56694671 0.80903983 0.64814815 1. 0.64814815] mean value: 0.5666137744534676 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96551724 0.98275862 0.96551724 0.94827586 0.93103448 0.96491228 0.98245614 0.96491228 1. 0.96491228] mean value: 0.9670296430732003 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.8 0.5 0. 0. 0.5 0.8 0.66666667 1. 0.66666667] mean value: 0.56 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 1. 1. 0. 0. 1. 1. 0.66666667 1. 0.66666667] mean value: 0.7 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.66666667 0.33333333 0. 0. 0.33333333 0.66666667 0.66666667 1. 0.66666667] mean value: 0.5 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.82424242 0.83333333 0.66666667 0.5 0.49090909 0.66666667 0.83333333 0.82407407 1. 0.82407407] mean value: 0.7463299663299663 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.66666667 0.33333333 0. 0. 0.33333333 0.66666667 0.5 1. 0.5 ] mean value: 0.45 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.58 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0533421 0.05394554 0.07271242 0.05558991 0.0737896 0.08503461 0.06888843 0.09786415 0.09248352 0.04690504] mean value: 0.07005553245544434 key: score_time value: [0.01226044 0.01864839 0.01218653 0.01870155 0.01248217 0.0122149 0.01872897 0.03266191 0.02536178 0.0123961 ] mean value: 0.017564272880554198 key: test_mcc value: [0.76038268 0.32993527 0.24366266 0.38251843 0.38251843 0.56694671 0.39056329 0.35714286 0.56694671 0.39056329] mean value: 0.4371180313243308 key: train_mcc value: [0.84368859 0.8348779 0.90074358 0.8609619 0.87923612 0.83872426 0.86097638 0.87924838 0.86097638 0.83872426] mean value: 0.859815774084354 key: test_accuracy value: [0.96551724 0.86206897 0.9137931 0.94827586 0.94827586 0.96491228 0.89473684 0.87719298 0.96491228 0.89473684] mean value: 0.9234422262552934 key: train_accuracy value: [0.98452611 0.98452611 0.99032882 0.98646035 0.98839458 0.98455598 0.98648649 0.98841699 0.98648649 0.98455598] mean value: 0.9864737907291099 key: test_fscore value: [0.75 0.33333333 0.28571429 0.4 0.4 0.5 0.4 0.36363636 0.5 0.4 ] mean value: 0.43326839826839825 key: train_fscore value: [0.85185185 0.84 0.90566038 0.86792453 0.88461538 0.84615385 0.86792453 0.88461538 0.86792453 0.84615385] mean value: 0.8662824275654464 key: test_precision value: [0.6 0.22222222 0.25 0.5 0.5 1. 0.28571429 0.25 1. 0.28571429] mean value: 0.48936507936507934 key: train_precision value: [0.85185185 0.91304348 0.92307692 0.88461538 0.92 0.88 0.88461538 0.92 0.88461538 0.88 ] mean value: 0.8941818407035799 key: test_recall value: [1. 0.66666667 0.33333333 0.33333333 0.33333333 0.33333333 0.66666667 0.66666667 0.33333333 0.66666667] mean value: 0.5333333333333333 key: train_recall value: [0.85185185 0.77777778 0.88888889 0.85185185 0.85185185 0.81481481 0.85185185 0.85185185 0.85185185 0.81481481] mean value: 0.8407407407407408 key: test_roc_auc value: [0.98181818 0.76969697 0.63939394 0.65757576 0.65757576 0.66666667 0.78703704 0.77777778 0.66666667 0.78703704] mean value: 0.7391245791245791 key: train_roc_auc value: [0.92184429 0.88684807 0.94240363 0.9228647 0.92388511 0.90435242 0.92287094 0.92388927 0.92287094 0.90435242] mean value: 0.9176181778436652 key: test_jcc value: [0.6 0.2 0.16666667 0.25 0.25 0.33333333 0.25 0.22222222 0.33333333 0.25 ] mean value: 0.28555555555555556 key: train_jcc value: [0.74193548 0.72413793 0.82758621 0.76666667 0.79310345 0.73333333 0.76666667 0.79310345 0.76666667 0.73333333] mean value: 0.7646533185020393 MCC on Blind test: 0.24 Accuracy on Blind test: 0.58 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01456141 0.01192737 0.01130104 0.01136208 0.01116967 0.01007843 0.0100646 0.01123548 0.00998092 0.01085925] mean value: 0.011254024505615235 key: score_time value: [0.01082349 0.00998855 0.00960755 0.00958705 0.00879359 0.00871301 0.0093348 0.00888252 0.00888968 0.00979233] mean value: 0.009441256523132324 key: test_mcc value: [ 0.85811633 0.48301038 0.38251843 -0.03093441 0.20563808 0.64814815 0.24282147 0.2962963 0.39056329 0.55039532] mean value: 0.40265733298043693 key: train_mcc value: [0.472191 0.46384474 0.49044321 0.47807824 0.55306158 0.49897952 0.48527524 0.51651884 0.42130112 0.48236557] mean value: 0.4862059055721764 key: test_accuracy value: [0.98275862 0.93103448 0.94827586 0.93103448 0.89655172 0.96491228 0.9122807 0.92982456 0.89473684 0.94736842] mean value: 0.9338777979431336 key: train_accuracy value: [0.94197292 0.94003868 0.94197292 0.93423598 0.9516441 0.94401544 0.93629344 0.94401544 0.93436293 0.94015444] mean value: 0.9408706302323324 key: test_fscore value: [0.85714286 0.5 0.4 0. 0.25 0.66666667 0.28571429 0.33333333 0.4 0.57142857] mean value: 0.42642857142857143 key: train_fscore value: [0.5 0.49180328 0.51612903 0.5 0.57627119 0.52459016 0.50746269 0.53968254 0.4516129 0.50793651] mean value: 0.5115488298733711 key: test_precision value: [0.75 0.4 0.5 0. 0.2 0.66666667 0.25 0.33333333 0.28571429 0.5 ] mean value: 0.38857142857142857 key: train_precision value: [0.45454545 0.44117647 0.45714286 0.41463415 0.53125 0.47058824 0.425 0.47222222 0.4 0.44444444] mean value: 0.4511003830578795 key: test_recall value: [1. 0.66666667 0.33333333 0. 0.33333333 0.66666667 0.33333333 0.33333333 0.66666667 0.66666667] mean value: 0.5 key: train_recall value: [0.55555556 0.55555556 0.59259259 0.62962963 0.62962963 0.59259259 0.62962963 0.62962963 0.51851852 0.59259259] mean value: 0.5925925925925926 key: test_roc_auc value: [0.99090909 0.80606061 0.65757576 0.49090909 0.63030303 0.82407407 0.63888889 0.64814815 0.78703704 0.81481481] mean value: 0.7288720538720539 key: train_roc_auc value: [0.75941043 0.75839002 0.77690854 0.79032502 0.79950869 0.77796636 0.79139323 0.79546655 0.73787433 0.7759297 ] mean value: 0.7763172863623838 key: test_jcc value: [0.75 0.33333333 0.25 0. 0.14285714 0.5 0.16666667 0.2 0.25 0.4 ] mean value: 0.29928571428571427 key: train_jcc value: [0.33333333 0.32608696 0.34782609 0.33333333 0.4047619 0.35555556 0.34 0.36956522 0.29166667 0.34042553] mean value: 0.34425545864352525 MCC on Blind test: 0.27 Accuracy on Blind test: 0.59 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01444721 0.01985002 0.02024388 0.02129292 0.02039051 0.01963663 0.01987505 0.01990676 0.02227616 0.02153397] mean value: 0.019945311546325683 key: score_time value: [0.01014018 0.01180124 0.01179695 0.01186323 0.01182628 0.01188421 0.01184916 0.01185966 0.01190591 0.01182818] mean value: 0.011675500869750976 key: test_mcc value: [0.76038268 0.56713087 0.38251843 0.43192347 0.5508895 0.56694671 0.55039532 0.38204659 0.56694671 0.48238191] mean value: 0.5241562187130722 key: train_mcc value: [0.87704592 0.76167305 0.98030696 0.56113172 0.89861406 0.7364114 0.83254034 0.87924838 0.87705763 0.8711254 ] mean value: 0.8275154861185016 key: test_accuracy value: [0.96551724 0.96551724 0.94827586 0.9137931 0.94827586 0.96491228 0.94736842 0.94736842 0.96491228 0.92982456] mean value: 0.9495765275257109 key: train_accuracy value: [0.98839458 0.9787234 0.99806576 0.9032882 0.99032882 0.97683398 0.98455598 0.98841699 0.98841699 0.98455598] mean value: 0.9781580696474313 key: test_fscore value: [0.75 0.5 0.4 0.44444444 0.57142857 0.5 0.57142857 0.4 0.5 0.5 ] mean value: 0.5137301587301587 key: train_fscore value: [0.88 0.75555556 0.98113208 0.51923077 0.90196078 0.71428571 0.83333333 0.88461538 0.88 0.87096774] mean value: 0.8221081358741665 key: test_precision value: [0.6 1. 0.5 0.33333333 0.5 1. 0.5 0.5 1. 0.4 ] mean value: 0.6333333333333333 key: train_precision value: [0.95652174 0.94444444 1. 0.35064935 0.95833333 1. 0.95238095 0.92 0.95652174 0.77142857] mean value: 0.8810280130497522 key: test_recall value: [1. 0.33333333 0.33333333 0.66666667 0.66666667 0.33333333 0.66666667 0.33333333 0.33333333 0.66666667] mean value: 0.5333333333333333 key: train_recall value: [0.81481481 0.62962963 0.96296296 1. 0.85185185 0.55555556 0.74074074 0.85185185 0.81481481 1. ] mean value: 0.8222222222222222 key: test_roc_auc value: [0.98181818 0.66666667 0.65757576 0.7969697 0.81515152 0.66666667 0.81481481 0.65740741 0.66666667 0.80555556] mean value: 0.7529292929292929 key: train_roc_auc value: [0.906387 0.81379441 0.98148148 0.94897959 0.92490552 0.77777778 0.86935204 0.92388927 0.90638908 0.99185336] mean value: 0.9044809519191248 key: test_jcc value: [0.6 0.33333333 0.25 0.28571429 0.4 0.33333333 0.4 0.25 0.33333333 0.33333333] mean value: 0.3519047619047619 key: train_jcc value: [0.78571429 0.60714286 0.96296296 0.35064935 0.82142857 0.55555556 0.71428571 0.79310345 0.78571429 0.77142857] mean value: 0.7147985603158017 MCC on Blind test: 0.3 Accuracy on Blind test: 0.6 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01839423 0.01934814 0.01817775 0.01901317 0.01911378 0.01753163 0.01846671 0.0182879 0.01776218 0.01677585] mean value: 0.018287134170532227 key: score_time value: [0.01187444 0.01188636 0.01183009 0.011832 0.01181626 0.01186419 0.01186252 0.01177549 0.01186895 0.01181245] mean value: 0.01184227466583252 key: test_mcc value: [0.76038268 0.45726459 0.56713087 0.38251843 0.2969697 0.56694671 0.55039532 0.64814815 0.76011695 0.24638016] mean value: 0.5236253563585851 key: train_mcc value: [0.88276644 0.4252731 0.73639347 0.82784768 0.92184429 0.85459272 0.80911474 0.78890378 0.65218543 0.49165645] mean value: 0.7390578100781762 key: test_accuracy value: [0.96551724 0.84482759 0.96551724 0.94827586 0.93103448 0.96491228 0.94736842 0.96491228 0.96491228 0.78947368] mean value: 0.9286751361161525 key: train_accuracy value: [0.98839458 0.83172147 0.97678917 0.98065764 0.99226306 0.98648649 0.98262548 0.97297297 0.94208494 0.86679537] mean value: 0.9520791169727341 key: test_fscore value: [0.75 0.4 0.5 0.4 0.33333333 0.5 0.57142857 0.66666667 0.75 0.25 ] mean value: 0.5121428571428571 key: train_fscore value: [0.88888889 0.37410072 0.71428571 0.83333333 0.92592593 0.85106383 0.8 0.78787879 0.63414634 0.43902439] mean value: 0.7248647931231662 key: test_precision value: [0.6 0.25 1. 0.5 0.33333333 1. 0.5 0.66666667 0.6 0.15384615] mean value: 0.5603846153846154 key: train_precision value: [0.88888889 0.23214286 1. 0.75757576 0.92592593 1. 1. 0.66666667 0.47272727 0.28125 ] mean value: 0.7225177368927369 key: test_recall value: [1. 1. 0.33333333 0.33333333 0.33333333 0.33333333 0.66666667 0.66666667 1. 0.66666667] mean value: 0.6333333333333333 key: train_recall value: [0.88888889 0.96296296 0.55555556 0.92592593 0.92592593 0.74074074 0.66666667 0.96296296 0.96296296 1. ] mean value: 0.8592592592592593 key: test_roc_auc value: [0.98181818 0.91818182 0.66666667 0.65757576 0.64848485 0.66666667 0.81481481 0.82407407 0.98148148 0.73148148] mean value: 0.7891245791245791 key: train_roc_auc value: [0.94138322 0.89372638 0.77777778 0.9547997 0.96092215 0.87037037 0.83333333 0.96824319 0.95194991 0.92973523] mean value: 0.9082241264915109 key: test_jcc value: [0.6 0.25 0.33333333 0.25 0.2 0.33333333 0.4 0.5 0.6 0.14285714] mean value: 0.36095238095238097 key: train_jcc value: [0.8 0.2300885 0.55555556 0.71428571 0.86206897 0.74074074 0.66666667 0.65 0.46428571 0.28125 ] mean value: 0.5964941852626854 MCC on Blind test: 0.33 Accuracy on Blind test: 0.61 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.20579267 0.20070958 0.19566107 0.1949234 0.19549155 0.1920774 0.19445324 0.19372702 0.19367528 0.19807458] mean value: 0.1964585781097412 key: score_time value: [0.01597142 0.01675367 0.01633692 0.0162847 0.01555514 0.01612735 0.01671433 0.01590085 0.01547456 0.01589966] mean value: 0.016101861000061037 key: test_mcc value: [0.43192347 0.76038268 0.56713087 0. 0.56713087 0. 0.38204659 0.55039532 0.80903983 0.2962963 ] mean value: 0.4364345939424672 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9137931 0.96551724 0.96551724 0.94827586 0.96551724 0.94736842 0.94736842 0.94736842 0.98245614 0.92982456] mean value: 0.9513006654567453 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.44444444 0.75 0.5 0. 0.5 0. 0.4 0.57142857 0.8 0.33333333] mean value: 0.4299206349206349 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.33333333 0.6 1. 0. 1. 0. 0.5 0.5 1. 0.33333333] mean value: 0.5266666666666666 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 1. 0.33333333 0. 0.33333333 0. 0.33333333 0.66666667 0.66666667 0.33333333] mean value: 0.4333333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.7969697 0.98181818 0.66666667 0.5 0.66666667 0.5 0.65740741 0.81481481 0.83333333 0.64814815] mean value: 0.7065824915824915 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.28571429 0.6 0.33333333 0. 0.33333333 0. 0.25 0.4 0.66666667 0.2 ] mean value: 0.3069047619047619 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.58 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05620265 0.06906557 0.07550597 0.0792644 0.08082414 0.07877135 0.06620526 0.07880783 0.06887841 0.0823257 ] mean value: 0.07358512878417969 key: score_time value: [0.02078819 0.02542567 0.02489471 0.03681445 0.02469873 0.0268507 0.02675176 0.02348495 0.02814078 0.03065825] mean value: 0.02685081958770752 key: test_mcc value: [ 0.64848485 0.80917359 0.56713087 0.56713087 -0.03093441 0.80903983 0.38204659 0.80903983 0.56694671 0.38204659] mean value: 0.5510105333468213 key: train_mcc value: [0.93993608 0.93993608 0.98030696 0.98030696 0.89861406 0.98030899 0.96029664 0.96029664 0.94053145 0.98030899] mean value: 0.9560842846865666 key: test_accuracy value: [0.96551724 0.98275862 0.96551724 0.96551724 0.93103448 0.98245614 0.94736842 0.98245614 0.96491228 0.94736842] mean value: 0.9634906231094978 key: train_accuracy value: [0.99419729 0.99419729 0.99806576 0.99806576 0.99032882 0.9980695 0.996139 0.996139 0.99420849 0.9980695 ] mean value: 0.9957480414927223 key: test_fscore value: [0.66666667 0.8 0.5 0.5 0. 0.8 0.4 0.8 0.5 0.4 ] mean value: 0.5366666666666666 key: train_fscore value: [0.94117647 0.94117647 0.98113208 0.98113208 0.90196078 0.98113208 0.96153846 0.96153846 0.94339623 0.98113208] mean value: 0.9575315176869006 key: test_precision value: [0.66666667 1. 1. 1. 0. 1. 0.5 1. 1. 0.5 ] mean value: 0.7666666666666666 key: train_precision value: [1. 1. 1. 1. 0.95833333 1. 1. 1. 0.96153846 1. ] mean value: 0.9919871794871795 key: test_recall value: [0.66666667 0.66666667 0.33333333 0.33333333 0. 0.66666667 0.33333333 0.66666667 0.33333333 0.33333333] mean value: 0.4333333333333333 key: train_recall value: [0.88888889 0.88888889 0.96296296 0.96296296 0.85185185 0.96296296 0.92592593 0.92592593 0.92592593 0.96296296] mean value: 0.9259259259259259 key: test_roc_auc value: [0.82424242 0.83333333 0.66666667 0.66666667 0.49090909 0.83333333 0.65740741 0.83333333 0.66666667 0.65740741] mean value: 0.712996632996633 key: train_roc_auc value: [0.94444444 0.94444444 0.98148148 0.98148148 0.92490552 0.98148148 0.96296296 0.96296296 0.96194463 0.98148148] mean value: 0.9627590891527464 key: test_jcc value: [0.5 0.66666667 0.33333333 0.33333333 0. 0.66666667 0.25 0.66666667 0.33333333 0.25 ] mean value: 0.39999999999999997 key: train_jcc value: [0.88888889 0.88888889 0.96296296 0.96296296 0.82142857 0.96296296 0.92592593 0.92592593 0.89285714 0.96296296] mean value: 0.9195767195767195 MCC on Blind test: 0.22 Accuracy on Blind test: 0.56 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.1794157 0.19253588 0.17404079 0.15018892 0.12815928 0.18164134 0.12826395 0.17812467 0.17865038 0.18234944] mean value: 0.1673370361328125 key: score_time value: [0.02564001 0.02518463 0.01993012 0.0287869 0.02011561 0.01578426 0.02864242 0.02854109 0.02530122 0.02950859] mean value: 0.024743485450744628 key: test_mcc value: [0.56713087 0.56713087 0. 0. 0. 0. 0. 0. 0.56694671 0. ] mean value: 0.17012084551450418 key: train_mcc value: [0.93993608 0.91921394 0.96029266 0.91921394 0.93993608 0.9399419 0.91922152 0.91922152 0.91922152 0.91922152] mean value: 0.9295420672150851 key: test_accuracy value: [0.96551724 0.96551724 0.94827586 0.94827586 0.94827586 0.94736842 0.94736842 0.94736842 0.96491228 0.94736842] mean value: 0.9530248033877797 key: train_accuracy value: [0.99419729 0.99226306 0.99613153 0.99226306 0.99419729 0.99420849 0.99227799 0.99227799 0.99227799 0.99227799] mean value: 0.9932372687691837 key: test_fscore value: [0.5 0.5 0. 0. 0. 0. 0. 0. 0.5 0. ] mean value: 0.15 key: train_fscore value: [0.94117647 0.92 0.96153846 0.92 0.94117647 0.94117647 0.92 0.92 0.92 0.92 ] mean value: 0.9305067873303168 key: test_precision value: [1. 1. 0. 0. 0. 0. 0. 0. 1. 0.] mean value: 0.3 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.33333333 0.33333333 0. 0. 0. 0. 0. 0. 0.33333333 0. ] mean value: 0.09999999999999999 key: train_recall value: [0.88888889 0.85185185 0.92592593 0.85185185 0.88888889 0.88888889 0.85185185 0.85185185 0.85185185 0.85185185] mean value: 0.8703703703703703 key: test_roc_auc value: [0.66666667 0.66666667 0.5 0.5 0.5 0.5 0.5 0.5 0.66666667 0.5 ] mean value: 0.55 key: train_roc_auc value: [0.94444444 0.92592593 0.96296296 0.92592593 0.94444444 0.94444444 0.92592593 0.92592593 0.92592593 0.92592593] mean value: 0.9351851851851852 key: test_jcc value: [0.33333333 0.33333333 0. 0. 0. 0. 0. 0. 0.33333333 0. ] mean value: 0.09999999999999999 key: train_jcc value: [0.88888889 0.85185185 0.92592593 0.85185185 0.88888889 0.88888889 0.85185185 0.85185185 0.85185185 0.85185185] mean value: 0.8703703703703703 MCC on Blind test: 0.09 Accuracy on Blind test: 0.52 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.78224468 0.77179456 0.7493031 0.78262949 0.78038216 0.78873324 0.78629851 0.78719425 0.77175832 0.78670549] mean value: 0.7787043809890747 key: score_time value: [0.00961399 0.00944686 0.01149893 0.0093987 0.01046252 0.00997686 0.0101912 0.00940299 0.01038051 0.01013303] mean value: 0.010050559043884277 key: test_mcc value: [0.64848485 0.64848485 0.56713087 0.56713087 0.56713087 0.64814815 0.64814815 0.64814815 1. 0.55039532] mean value: 0.6493202081863393 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96551724 0.96551724 0.96551724 0.96551724 0.96551724 0.96491228 0.96491228 0.96491228 1. 0.94736842] mean value: 0.9669691470054447 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.66666667 0.5 0.5 0.5 0.66666667 0.66666667 0.66666667 1. 0.57142857] mean value: 0.6404761904761904 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.66666667 1. 1. 1. 0.66666667 0.66666667 0.66666667 1. 0.5 ] mean value: 0.7833333333333333 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.66666667 0.33333333 0.33333333 0.33333333 0.66666667 0.66666667 0.66666667 1. 0.66666667] mean value: 0.6 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.82424242 0.82424242 0.66666667 0.66666667 0.66666667 0.82407407 0.82407407 0.82407407 1. 0.81481481] mean value: 0.7935521885521886 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.5 0.33333333 0.33333333 0.33333333 0.5 0.5 0.5 1. 0.4 ] mean value: 0.49 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.27 Accuracy on Blind test: 0.58 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.0282979 0.02876115 0.02958274 0.02728486 0.02812648 0.02823567 0.02803469 0.02771449 0.02794123 0.02837539] mean value: 0.028235459327697755 key: score_time value: [0.0126543 0.01246667 0.01491022 0.01432562 0.01413298 0.01427579 0.01446295 0.01423359 0.01441836 0.0146451 ] mean value: 0.014052557945251464 key: test_mcc value: [ 0.68755165 0.1762953 -0.05454545 -0.04413674 -0.05454545 0.24282147 -0.05555556 0.1134023 0.17516462 0.2962963 ] mean value: 0.14827484228206453 key: train_mcc value: [0.37617287 0.4990914 0.53407501 0.4990914 0.4990914 0.46163583 0.53409532 0.56704983 0.46163583 0.46163583] mean value: 0.4893574733872949 key: test_accuracy value: [0.94827586 0.87931034 0.89655172 0.9137931 0.89655172 0.9122807 0.89473684 0.8245614 0.87719298 0.92982456] mean value: 0.897307924984876 key: train_accuracy value: [0.95551257 0.96131528 0.96324952 0.96131528 0.96131528 0.95945946 0.96332046 0.96525097 0.95945946 0.95945946] mean value: 0.9609657737317312 key: test_fscore value: [0.66666667 0.22222222 0. 0. 0. 0.28571429 0. 0.16666667 0.22222222 0.33333333] mean value: 0.18968253968253967 key: train_fscore value: [0.25806452 0.41176471 0.45714286 0.41176471 0.41176471 0.36363636 0.45714286 0.5 0.36363636 0.36363636] mean value: 0.3998553438970896 key: test_precision value: [0.5 0.16666667 0. 0. 0. 0.25 0. 0.11111111 0.16666667 0.33333333] mean value: 0.15277777777777776 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.33333333 0. 0. 0. 0.33333333 0. 0.33333333 0.33333333 0.33333333] mean value: 0.26666666666666666 key: train_recall value: [0.14814815 0.25925926 0.2962963 0.25925926 0.25925926 0.22222222 0.2962963 0.33333333 0.22222222 0.22222222] mean value: 0.2518518518518518 key: test_roc_auc value: [0.97272727 0.62121212 0.47272727 0.48181818 0.47272727 0.63888889 0.47222222 0.59259259 0.62037037 0.64814815] mean value: 0.5993434343434343 key: train_roc_auc value: [0.57407407 0.62962963 0.64814815 0.62962963 0.62962963 0.61111111 0.64814815 0.66666667 0.61111111 0.61111111] mean value: 0.625925925925926 key: test_jcc value: [0.5 0.125 0. 0. 0. 0.16666667 0. 0.09090909 0.125 0.2 ] mean value: 0.12075757575757576 key: train_jcc value: [0.14814815 0.25925926 0.2962963 0.25925926 0.25925926 0.22222222 0.2962963 0.33333333 0.22222222 0.22222222] mean value: 0.2518518518518518 MCC on Blind test: -0.05 Accuracy on Blind test: 0.5 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02868271 0.03869915 0.03853083 0.03858232 0.03899741 0.03781533 0.04338479 0.03836989 0.03853822 0.03898597] mean value: 0.03805866241455078 key: score_time value: [0.01904464 0.0187161 0.01863718 0.01865602 0.0190537 0.01886559 0.01899195 0.01890612 0.01882076 0.01871467] mean value: 0.018840670585632324 key: test_mcc value: [1. 0.80917359 0.56713087 0.56713087 0.38251843 0.80903983 0.2962963 0.64814815 0.56694671 0.64814815] mean value: 0.6294532902524936 key: train_mcc value: [0.73676407 0.78589709 0.8325254 0.8321053 0.76167305 0.76169209 0.76169209 0.76169209 0.76131958 0.80951339] mean value: 0.7804874132160566 key: test_accuracy value: [1. 0.98275862 0.96551724 0.96551724 0.94827586 0.98245614 0.92982456 0.96491228 0.96491228 0.96491228] mean value: 0.9669086509376891 key: train_accuracy value: [0.97678917 0.98065764 0.98452611 0.98452611 0.9787234 0.97876448 0.97876448 0.97876448 0.97876448 0.98262548] mean value: 0.9802905834820729 key: test_fscore value: [1. 0.8 0.5 0.5 0.4 0.8 0.33333333 0.66666667 0.5 0.66666667] mean value: 0.6166666666666667 key: train_fscore value: [0.72727273 0.7826087 0.83333333 0.82608696 0.75555556 0.75555556 0.75555556 0.75555556 0.74418605 0.80851064] mean value: 0.7744220619811697 key: test_precision value: [1. 1. 1. 1. 0.5 1. 0.33333333 0.66666667 1. 0.66666667] mean value: 0.8166666666666667 key: train_precision value: [0.94117647 0.94736842 0.95238095 1. 0.94444444 0.94444444 0.94444444 0.94444444 1. 0.95 ] mean value: 0.9568703621799597 key: test_recall value: [1. 0.66666667 0.33333333 0.33333333 0.33333333 0.66666667 0.33333333 0.66666667 0.33333333 0.66666667] mean value: 0.5333333333333333 key: train_recall value: [0.59259259 0.66666667 0.74074074 0.7037037 0.62962963 0.62962963 0.62962963 0.62962963 0.59259259 0.7037037 ] mean value: 0.6518518518518519 key: test_roc_auc value: [1. 0.83333333 0.66666667 0.66666667 0.65757576 0.83333333 0.64814815 0.82407407 0.66666667 0.82407407] mean value: 0.762053872053872 key: train_roc_auc value: [0.79527589 0.83231293 0.86934996 0.85185185 0.81379441 0.81379648 0.81379648 0.81379648 0.7962963 0.85083352] mean value: 0.8251104306850597 key: test_jcc value: [1. 0.66666667 0.33333333 0.33333333 0.25 0.66666667 0.2 0.5 0.33333333 0.5 ] mean value: 0.47833333333333333 key: train_jcc value: [0.57142857 0.64285714 0.71428571 0.7037037 0.60714286 0.60714286 0.60714286 0.60714286 0.59259259 0.67857143] mean value: 0.6332010582010582 MCC on Blind test: 0.25 Accuracy on Blind test: 0.57 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.21960998 0.27916813 0.2967515 0.17903543 0.30591917 0.16244292 0.22275519 0.29180765 0.29906225 0.32112122] mean value: 0.2577673435211182 key: score_time value: [0.01901317 0.01612353 0.0195241 0.01285934 0.0123415 0.01899672 0.02589607 0.02119637 0.01926637 0.02765989] mean value: 0.019287705421447754 key: test_mcc value: [0.80917359 0.80917359 0.56713087 0. 0.38251843 0.80903983 0.2962963 0.64814815 0.56694671 0.64814815] mean value: 0.5536575623422023 key: train_mcc value: [0.68418345 0.65669104 0.8325254 0.71071615 0.76167305 0.76169209 0.76169209 0.76169209 0.76131958 0.65717642] mean value: 0.7349361348491141 key: test_accuracy value: [0.98275862 0.98275862 0.96551724 0.94827586 0.94827586 0.98245614 0.92982456 0.96491228 0.96491228 0.96491228] mean value: 0.9634603750756201 key: train_accuracy value: [0.9729207 0.97098646 0.98452611 0.97485493 0.9787234 0.97876448 0.97876448 0.97876448 0.97876448 0.97104247] mean value: 0.9768111991516247 key: test_fscore value: [0.8 0.8 0.5 0. 0.4 0.8 0.33333333 0.66666667 0.5 0.66666667] mean value: 0.5466666666666666 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:114: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:117: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.65 0.61538462 0.83333333 0.68292683 0.75555556 0.75555556 0.75555556 0.75555556 0.74418605 0.63414634] mean value: 0.7182199388183507 key: test_precision value: [1. 1. 1. 0. 0.5 1. 0.33333333 0.66666667 1. 0.66666667] mean value: 0.7166666666666667 key: train_precision value: [1. 1. 0.95238095 1. 0.94444444 0.94444444 0.94444444 0.94444444 1. 0.92857143] mean value: 0.9658730158730159 key: test_recall value: [0.66666667 0.66666667 0.33333333 0. 0.33333333 0.66666667 0.33333333 0.66666667 0.33333333 0.66666667] mean value: 0.4666666666666666 key: train_recall value: [0.48148148 0.44444444 0.74074074 0.51851852 0.62962963 0.62962963 0.62962963 0.62962963 0.59259259 0.48148148] mean value: 0.5777777777777777 key: test_roc_auc value: [0.83333333 0.83333333 0.66666667 0.5 0.65757576 0.83333333 0.64814815 0.82407407 0.66666667 0.82407407] mean value: 0.7287205387205387 key: train_roc_auc value: [0.74074074 0.72222222 0.86934996 0.75925926 0.81379441 0.81379648 0.81379648 0.81379648 0.7962963 0.73972241] mean value: 0.7882774752806758 key: test_jcc value: [0.66666667 0.66666667 0.33333333 0. 0.25 0.66666667 0.2 0.5 0.33333333 0.5 ] mean value: 0.4116666666666666 key: train_jcc value: [0.48148148 0.44444444 0.71428571 0.51851852 0.60714286 0.60714286 0.60714286 0.60714286 0.59259259 0.46428571] mean value: 0.5644179894179894 MCC on Blind test: 0.21 Accuracy on Blind test: 0.55 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04487562 0.04138589 0.04369426 0.04303861 0.04680061 0.04244494 0.03946328 0.04193759 0.04206204 0.03964663] mean value: 0.04253494739532471 key: score_time value: [0.01250577 0.01236224 0.01433182 0.01480269 0.02338958 0.01450467 0.01239443 0.01240659 0.01476336 0.0124073 ] mean value: 0.014386844635009766 key: test_mcc value: [0.88989899 0.92724828 0.98181818 0.94511785 0.96396098 0.96393713 0.94635821 0.98181211 0.96396098 0.92905936] mean value: 0.9493172082658792 key: train_mcc value: [0.97573744 0.96750835 0.96550335 0.96142511 0.96543927 0.96343147 0.97158737 0.96750942 0.96550464 0.97158737] mean value: 0.9675233803087637 key: test_accuracy value: [0.94495413 0.96330275 0.99082569 0.97247706 0.98165138 0.98165138 0.97247706 0.99082569 0.98165138 0.96330275] mean value: 0.9743119266055046 key: train_accuracy value: [0.98776758 0.98369011 0.98267074 0.98063201 0.98267074 0.98165138 0.98572885 0.98369011 0.98267074 0.98572885] mean value: 0.9836901121304791 key: test_fscore value: [0.94444444 0.96363636 0.99082569 0.97247706 0.98181818 0.98214286 0.97345133 0.99099099 0.98148148 0.96491228] mean value: 0.974618067994328 key: train_fscore value: [0.98790323 0.98383838 0.98284561 0.98082745 0.98281092 0.98178138 0.98582996 0.98380567 0.98281092 0.98582996] mean value: 0.9838283470967917 key: test_precision value: [0.94444444 0.94642857 0.98181818 0.96363636 0.96428571 0.96491228 0.94827586 0.98214286 1. 0.93220339] mean value: 0.962814766535736 key: train_precision value: [0.97804391 0.9759519 0.974 0.972 0.97590361 0.97389558 0.97791165 0.97590361 0.9739479 0.97791165] mean value: 0.9755469816192518 key: test_recall value: [0.94444444 0.98148148 1. 0.98148148 1. 1. 1. 1. 0.96363636 1. ] mean value: 0.9871043771043772 key: train_recall value: [0.99796334 0.99185336 0.99185336 0.9898167 0.9898167 0.98979592 0.99387755 0.99183673 0.99183673 0.99387755] mean value: 0.992252795211771 key: test_roc_auc value: [0.94494949 0.96346801 0.99090909 0.97255892 0.98181818 0.98148148 0.97222222 0.99074074 0.98181818 0.96296296] mean value: 0.9742929292929293 key: train_roc_auc value: [0.98775718 0.98368178 0.98266137 0.98062264 0.98266345 0.98165967 0.98573715 0.98369841 0.98268008 0.98573715] mean value: 0.9836898873602394 key: test_jcc value: [0.89473684 0.92982456 0.98181818 0.94642857 0.96428571 0.96491228 0.94827586 0.98214286 0.96363636 0.93220339] mean value: 0.9508264624421688 key: train_jcc value: [0.97609562 0.96819085 0.96626984 0.96237624 0.96620278 0.96421471 0.97205589 0.96812749 0.96620278 0.97205589] mean value: 0.9681792096111226 MCC on Blind test: 0.3 Accuracy on Blind test: 0.6 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.99837112 1.1198895 1.16606784 1.38026643 1.1718266 1.01276398 1.08470702 1.11438894 1.01340914 1.17820048] mean value: 1.1239891052246094 key: score_time value: [0.01469374 0.01463389 0.02088499 0.01533866 0.01591563 0.01322365 0.01592922 0.01325226 0.02350879 0.01339555] mean value: 0.016077637672424316 key: test_mcc value: [0.90841751 0.92724828 0.98181818 0.96329966 0.96396098 0.96393713 0.92905936 0.96329966 0.94511785 0.92905936] mean value: 0.947521797154391 key: train_mcc value: [0.9857799 0.9837635 0.97767312 0.99796333 0.9857799 0.95334985 0.9817516 0.94927144 0.9534452 0.9817516 ] mean value: 0.9750529452374251 key: test_accuracy value: [0.95412844 0.96330275 0.99082569 0.98165138 0.98165138 0.98165138 0.96330275 0.98165138 0.97247706 0.96330275] mean value: 0.9733944954128441 key: train_accuracy value: [0.99286442 0.99184506 0.98878695 0.99898063 0.99286442 0.97655454 0.99082569 0.9745158 0.97655454 0.99082569] mean value: 0.9874617737003059 key: test_fscore value: [0.95412844 0.96363636 0.99082569 0.98148148 0.98181818 0.98214286 0.96491228 0.98181818 0.97247706 0.96491228] mean value: 0.9738152819961126 key: train_fscore value: [0.9929078 0.99190283 0.98887765 0.99898271 0.9929078 0.97679112 0.99088146 0.97477296 0.97683787 0.99088146] mean value: 0.9875743656721899 key: test_precision value: [0.94545455 0.94642857 0.98181818 0.98148148 0.96428571 0.96491228 0.93220339 0.98181818 0.98148148 0.93220339] mean value: 0.9612087218130929 key: train_precision value: [0.98790323 0.98591549 0.98192771 0.99796748 0.98790323 0.96606786 0.98390342 0.96407186 0.96421471 0.98390342] mean value: 0.9803778408423602 key: test_recall value: [0.96296296 0.98148148 1. 0.98148148 1. 1. 1. 0.98181818 0.96363636 1. ] mean value: 0.9871380471380471 key: train_recall value: [0.99796334 0.99796334 0.99592668 1. 0.99796334 0.9877551 0.99795918 0.98571429 0.98979592 0.99795918] mean value: 0.9949000374080386 key: test_roc_auc value: [0.95420875 0.96346801 0.99090909 0.98164983 0.98181818 0.98148148 0.96296296 0.98164983 0.97255892 0.96296296] mean value: 0.9733670033670033 key: train_roc_auc value: [0.99285922 0.99183881 0.98877967 0.99897959 0.99285922 0.97656594 0.99083295 0.9745272 0.97656802 0.99083295] mean value: 0.9874643584521385 key: test_jcc value: [0.9122807 0.92982456 0.98181818 0.96363636 0.96428571 0.96491228 0.93220339 0.96428571 0.94642857 0.93220339] mean value: 0.9491878868975211 key: train_jcc value: [0.98591549 0.98393574 0.978 0.99796748 0.98591549 0.95463511 0.98192771 0.9507874 0.95472441 0.98192771] mean value: 0.9755736549753808 MCC on Blind test: 0.32 Accuracy on Blind test: 0.61 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01865888 0.01219416 0.0124011 0.01228523 0.01231241 0.01269364 0.01386428 0.0126965 0.01259089 0.01382422] mean value: 0.013352131843566895 key: score_time value: [0.07297492 0.00939155 0.00941801 0.00939441 0.00960684 0.01031232 0.00982857 0.00956607 0.00982213 0.00963259] mean value: 0.015994739532470704 key: test_mcc value: [0.85382288 0.85560229 0.84200168 0.85382288 0.80953606 0.87280881 0.83496131 0.8582788 0.98181818 0.8582788 ] mean value: 0.8620931700334511 key: train_mcc value: [0.8661361 0.86728222 0.86560142 0.87136867 0.87160834 0.86946083 0.87788349 0.87094841 0.8476271 0.86997051] mean value: 0.8677887071504141 key: test_accuracy value: [0.9266055 0.9266055 0.91743119 0.9266055 0.89908257 0.93577982 0.91743119 0.9266055 0.99082569 0.9266055 ] mean value: 0.9293577981651376 key: train_accuracy value: [0.93170234 0.93272171 0.93170234 0.93476045 0.93476045 0.93374108 0.93781855 0.93476045 0.92150866 0.93374108] mean value: 0.9327217125382263 key: test_fscore value: [0.92727273 0.92857143 0.92173913 0.92727273 0.90598291 0.9380531 0.91891892 0.93103448 0.99082569 0.93103448] mean value: 0.9320705589389259 key: train_fscore value: [0.93437806 0.93491124 0.93411996 0.93688363 0.93700787 0.93583416 0.93990148 0.93650794 0.92531523 0.93608653] mean value: 0.9350946094457768 key: test_precision value: [0.91071429 0.89655172 0.86885246 0.91071429 0.84126984 0.9137931 0.91071429 0.8852459 1. 0.8852459 ] mean value: 0.9023101788293987 key: train_precision value: [0.9 0.90630975 0.90304183 0.9082218 0.90666667 0.90630975 0.90857143 0.91119691 0.88170055 0.90322581] mean value: 0.9035244492701532 key: test_recall value: [0.94444444 0.96296296 0.98148148 0.94444444 0.98148148 0.96363636 0.92727273 0.98181818 0.98181818 0.98181818] mean value: 0.9651178451178452 key: train_recall value: [0.97148676 0.96537678 0.96741344 0.96741344 0.9694501 0.96734694 0.97346939 0.96326531 0.97346939 0.97142857] mean value: 0.9690120121368303 key: test_roc_auc value: [0.92676768 0.92693603 0.91801347 0.92676768 0.89983165 0.93552189 0.91734007 0.92609428 0.99090909 0.92609428] mean value: 0.9294276094276094 key: train_roc_auc value: [0.93166175 0.93268839 0.9316659 0.93472713 0.93472505 0.9337753 0.93785486 0.93478948 0.92156158 0.93377946] mean value: 0.9327228895631572 key: test_jcc value: [0.86440678 0.86666667 0.85483871 0.86440678 0.828125 0.88333333 0.85 0.87096774 0.98181818 0.87096774] mean value: 0.8735530934688602 key: train_jcc value: [0.87683824 0.87777778 0.87638376 0.8812616 0.88148148 0.87940631 0.8866171 0.88059701 0.86101083 0.87985213] mean value: 0.8781226233231253 MCC on Blind test: 0.29 Accuracy on Blind test: 0.61 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01229978 0.01690793 0.0169034 0.01696324 0.01693296 0.01681113 0.01681638 0.01694918 0.01694226 0.01689982] mean value: 0.01644260883331299 key: score_time value: [0.01223588 0.01235938 0.0123477 0.01233745 0.01240754 0.01235318 0.01235676 0.01237297 0.01232409 0.01234961] mean value: 0.012344455718994141 key: test_mcc value: [0.74351235 0.85541029 0.80136396 0.81882759 0.78478168 0.88989899 0.85319865 0.81649832 0.83501684 0.76161616] mean value: 0.8160124823784499 key: train_mcc value: [0.82670934 0.81447691 0.82263186 0.81249922 0.82076536 0.80838391 0.81246155 0.81448891 0.81246155 0.82059712] mean value: 0.816547573939551 key: test_accuracy value: [0.87155963 0.9266055 0.89908257 0.90825688 0.88990826 0.94495413 0.9266055 0.90825688 0.91743119 0.88073394] mean value: 0.9073394495412844 key: train_accuracy value: [0.91335372 0.90723751 0.91131498 0.90621814 0.91029562 0.90417941 0.90621814 0.90723751 0.90621814 0.91029562] mean value: 0.908256880733945 key: test_fscore value: [0.86792453 0.92307692 0.89320388 0.91071429 0.89473684 0.94545455 0.92727273 0.90909091 0.91743119 0.88073394] mean value: 0.9069639782126365 key: train_fscore value: [0.91335372 0.90723751 0.91131498 0.9057377 0.90946502 0.90368852 0.9057377 0.90685773 0.9057377 0.91002045] mean value: 0.9079151055700868 key: test_precision value: [0.88461538 0.96 0.93877551 0.87931034 0.85 0.94545455 0.92727273 0.90909091 0.92592593 0.88888889] mean value: 0.9109334236280049 key: train_precision value: [0.91428571 0.90816327 0.9122449 0.91134021 0.91891892 0.90740741 0.90946502 0.90965092 0.90946502 0.91188525] mean value: 0.9112826621141457 key: test_recall value: [0.85185185 0.88888889 0.85185185 0.94444444 0.94444444 0.94545455 0.92727273 0.90909091 0.90909091 0.87272727] mean value: 0.9045117845117845 key: train_recall value: [0.91242363 0.90631365 0.91038697 0.90020367 0.90020367 0.9 0.90204082 0.90408163 0.90204082 0.90816327] mean value: 0.9045858098840351 key: test_roc_auc value: [0.87138047 0.92626263 0.8986532 0.90858586 0.89040404 0.94494949 0.92659933 0.90824916 0.91750842 0.88080808] mean value: 0.9073400673400673 key: train_roc_auc value: [0.91335467 0.90723846 0.91131593 0.90622428 0.91030591 0.90417515 0.90621389 0.9072343 0.90621389 0.91029345] mean value: 0.9082569932249885 key: test_jcc value: [0.76666667 0.85714286 0.80701754 0.83606557 0.80952381 0.89655172 0.86440678 0.83333333 0.84745763 0.78688525] mean value: 0.8305051161116039 key: train_jcc value: [0.84052533 0.83022388 0.83707865 0.82771536 0.83396226 0.82429907 0.82771536 0.82958801 0.82771536 0.83489681] mean value: 0.8313720083087689 MCC on Blind test: 0.24 Accuracy on Blind test: 0.6 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.0153439 0.01146626 0.01095605 0.01241088 0.01217341 0.01237679 0.01235223 0.01266718 0.01223803 0.01281548] mean value: 0.012480020523071289 key: score_time value: [0.03982353 0.01768589 0.01724744 0.01506829 0.01472926 0.01545072 0.01538634 0.015414 0.01499319 0.01525664] mean value: 0.018105530738830568 key: test_mcc value: [0.81882759 0.92659933 0.946411 0.96329966 0.946411 0.91202529 0.92719967 0.91202529 0.98181211 0.98181211] mean value: 0.931642304301634 key: train_mcc value: [0.95763309 0.95973466 0.95145652 0.95951143 0.95355358 0.95553835 0.96550464 0.96550464 0.94927144 0.94763304] mean value: 0.9565341373626831 key: test_accuracy value: [0.90825688 0.96330275 0.97247706 0.98165138 0.97247706 0.95412844 0.96330275 0.95412844 0.99082569 0.99082569] mean value: 0.9651376146788991 key: train_accuracy value: [0.97859327 0.97961264 0.97553517 0.97961264 0.97655454 0.9775739 0.98267074 0.98267074 0.9745158 0.97349643] mean value: 0.9780835881753314 key: test_fscore value: [0.91071429 0.96296296 0.97297297 0.98148148 0.97297297 0.95652174 0.96428571 0.95652174 0.99099099 0.99099099] mean value: 0.9660415850633242 key: train_fscore value: [0.97893681 0.97995992 0.97590361 0.97987928 0.97693079 0.9778672 0.98281092 0.98281092 0.97477296 0.9739479 ] mean value: 0.9783820308622914 key: test_precision value: [0.87931034 0.96296296 0.94736842 0.98148148 0.94736842 0.91666667 0.94736842 0.91666667 0.98214286 0.98214286] mean value: 0.9463479100048973 key: train_precision value: [0.96442688 0.96449704 0.96237624 0.96819085 0.96245059 0.96428571 0.9739479 0.9739479 0.96407186 0.95669291] mean value: 0.9654887879812519 key: test_recall value: [0.94444444 0.96296296 1. 0.98148148 1. 1. 0.98181818 1. 1. 1. ] mean value: 0.9870707070707071 key: train_recall value: [0.99389002 0.99592668 0.9898167 0.99185336 0.99185336 0.99183673 0.99183673 0.99183673 0.98571429 0.99183673] mean value: 0.9916401346689389 key: test_roc_auc value: [0.90858586 0.96329966 0.97272727 0.98164983 0.97272727 0.9537037 0.96313131 0.9537037 0.99074074 0.99074074] mean value: 0.9651010101010101 key: train_roc_auc value: [0.97857766 0.97959599 0.9755206 0.97960015 0.97653893 0.97758843 0.98268008 0.98268008 0.9745272 0.97351511] mean value: 0.97808242237832 key: test_jcc value: [0.83606557 0.92857143 0.94736842 0.96363636 0.94736842 0.91666667 0.93103448 0.91666667 0.98214286 0.98214286] mean value: 0.9351663738461216 key: train_jcc value: [0.95874263 0.96070727 0.95294118 0.96055227 0.95490196 0.95669291 0.96620278 0.96620278 0.9507874 0.94921875] mean value: 0.9576949938828678 MCC on Blind test: 0.26 Accuracy on Blind test: 0.59 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.03312922 0.02710271 0.02765346 0.02737236 0.02725911 0.02756953 0.0275867 0.02737093 0.02711725 0.02727389] mean value: 0.027943515777587892 key: score_time value: [0.01420808 0.01313591 0.01342154 0.01315141 0.01308537 0.01318836 0.01315355 0.01331091 0.0128572 0.01323676] mean value: 0.013274908065795898 key: test_mcc value: [0.90841751 0.94509941 1. 0.96329966 0.96396098 0.98181211 0.92905936 0.98181211 0.98181818 0.94635821] mean value: 0.9601637548953743 key: train_mcc value: [0.97968567 0.97560783 0.96946898 0.96946898 0.97556738 0.97351478 0.97351478 0.97149021 0.96946962 0.97560844] mean value: 0.9733396664150108 key: test_accuracy value: [0.95412844 0.97247706 1. 0.98165138 0.98165138 0.99082569 0.96330275 0.99082569 0.99082569 0.97247706] mean value: 0.9798165137614679 key: train_accuracy value: [0.98980632 0.98776758 0.98470948 0.98470948 0.98776758 0.98674822 0.98674822 0.98572885 0.98470948 0.98776758] mean value: 0.9866462793068298 key: test_fscore value: [0.95412844 0.97196262 1. 0.98148148 0.98181818 0.99099099 0.96491228 0.99099099 0.99082569 0.97345133] mean value: 0.9800561998679824 key: train_fscore value: [0.98987854 0.98785425 0.98480243 0.98480243 0.98782961 0.98677518 0.98677518 0.98577236 0.98477157 0.98782961] mean value: 0.9867091173333614 key: test_precision value: [0.94545455 0.98113208 1. 0.98148148 0.96428571 0.98214286 0.93220339 0.98214286 1. 0.94827586] mean value: 0.9717118782878628 key: train_precision value: [0.98390342 0.98189135 0.97983871 0.97983871 0.98383838 0.98377282 0.98377282 0.98178138 0.97979798 0.98185484] mean value: 0.9820290405776002 key: test_recall value: [0.96296296 0.96296296 1. 0.98148148 1. 1. 1. 1. 0.98181818 1. ] mean value: 0.9889225589225589 key: train_recall value: [0.99592668 0.99389002 0.9898167 0.9898167 0.99185336 0.98979592 0.98979592 0.98979592 0.98979592 0.99387755] mean value: 0.9914364686811588 key: test_roc_auc value: [0.95420875 0.97239057 1. 0.98164983 0.98181818 0.99074074 0.96296296 0.99074074 0.99090909 0.97222222] mean value: 0.9797643097643097 key: train_roc_auc value: [0.98980007 0.98776134 0.98470427 0.98470427 0.98776341 0.98675132 0.98675132 0.98573299 0.98471466 0.98777381] mean value: 0.9866457458747246 key: test_jcc value: [0.9122807 0.94545455 1. 0.96363636 0.96428571 0.98214286 0.93220339 0.98214286 0.98181818 0.94827586] mean value: 0.9612240473134379 key: train_jcc value: [0.97995992 0.976 0.97005988 0.97005988 0.9759519 0.97389558 0.97389558 0.97194389 0.97 0.9759519 ] mean value: 0.9737718540368138 MCC on Blind test: 0.28 Accuracy on Blind test: 0.59 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.23760557 2.39744544 2.93186569 2.52583194 2.58781528 2.25252485 2.65761113 2.15109515 1.78711081 2.11473489] mean value: 2.3643640756607054 key: score_time value: [0.01260877 0.01268816 0.01267433 0.01268578 0.01266146 0.01271725 0.01503229 0.0170579 0.01278281 0.01270413] mean value: 0.013361287117004395 key: test_mcc value: [0.94511785 0.92724828 0.98181211 0.96329966 0.92724828 0.98181211 0.91202529 0.98181211 0.96396098 0.96393713] mean value: 0.954827381792807 key: train_mcc value: [1. 1. 1. 1. 1. 0.99796334 1. 0.99796334 0.99592252 0.99796334] mean value: 0.9989812544162268 key: test_accuracy value: [0.97247706 0.96330275 0.99082569 0.98165138 0.96330275 0.99082569 0.95412844 0.99082569 0.98165138 0.98165138] mean value: 0.9770642201834863 key: train_accuracy value: [1. 1. 1. 1. 1. 0.99898063 1. 0.99898063 0.99796126 0.99898063] mean value: 0.9994903160040775 key: test_fscore value: [0.97247706 0.96363636 0.99065421 0.98148148 0.96363636 0.99099099 0.95652174 0.99099099 0.98148148 0.98214286] mean value: 0.9774013538318624 key: train_fscore value: [1. 1. 1. 1. 1. 0.99898063 1. 0.99898063 0.99795918 0.99898063] mean value: 0.9994901079697934 key: test_precision value: [0.96363636 0.94642857 1. 0.98148148 0.94642857 0.98214286 0.91666667 0.98214286 1. 0.96491228] mean value: 0.9683839649629123 key: train_precision value: [1. 1. 1. 1. 1. 0.99796334 1. 0.99796334 0.99795918 0.99796334] mean value: 0.9991849204040069 key: test_recall value: [0.98148148 0.98148148 0.98148148 0.98148148 0.98148148 1. 1. 1. 0.96363636 1. ] mean value: 0.9871043771043772 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99795918 1. ] mean value: 0.9997959183673469 key: test_roc_auc value: [0.97255892 0.96346801 0.99074074 0.98164983 0.96346801 0.99074074 0.9537037 0.99074074 0.98181818 0.98148148] mean value: 0.977037037037037 key: train_roc_auc value: [1. 1. 1. 1. 1. 0.99898167 1. 0.99898167 0.99796126 0.99898167] mean value: 0.9994906272081133 key: test_jcc value: [0.94642857 0.92982456 0.98148148 0.96363636 0.92982456 0.98214286 0.91666667 0.98214286 0.96363636 0.96491228] mean value: 0.9560696564643933 key: train_jcc value: [1. 1. 1. 1. 1. 0.99796334 1. 0.99796334 0.99592668 0.99796334] mean value: 0.9989816700610998 MCC on Blind test: 0.28 Accuracy on Blind test: 0.59 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.07054901 0.06181669 0.06172919 0.05932713 0.06351018 0.06178021 0.0615108 0.0635767 0.0611248 0.06372452] mean value: 0.06286492347717285 key: score_time value: [0.00950432 0.00932312 0.00905943 0.00901723 0.00903702 0.00915956 0.00907493 0.00908566 0.0090878 0.00900388] mean value: 0.009135293960571288 key: test_mcc value: [0.94509941 1. 0.96393713 0.98181211 0.9291517 0.94511785 0.88989899 0.96393713 0.96393713 0.92905936] mean value: 0.9511950806790457 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97247706 1. 0.98165138 0.99082569 0.96330275 0.97247706 0.94495413 0.98165138 0.98165138 0.96330275] mean value: 0.9752293577981651 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97196262 1. 0.98113208 0.99065421 0.96428571 0.97247706 0.94545455 0.98214286 0.98214286 0.96491228] mean value: 0.9755164216849517 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.98113208 1. 1. 1. 0.93103448 0.98148148 0.94545455 0.96491228 0.96491228 0.93220339] mean value: 0.9701130536400363 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 1. 0.96296296 0.98148148 1. 0.96363636 0.94545455 1. 1. 1. ] mean value: 0.9816498316498317 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97239057 1. 0.98148148 0.99074074 0.96363636 0.97255892 0.94494949 0.98148148 0.98148148 0.96296296] mean value: 0.9751683501683501 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94545455 1. 0.96296296 0.98148148 0.93103448 0.94642857 0.89655172 0.96491228 0.96491228 0.93220339] mean value: 0.952594171945813 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.2 Accuracy on Blind test: 0.56 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.13620734 0.13592768 0.14145327 0.13965559 0.13692927 0.13765979 0.13613105 0.13678074 0.13775492 0.13754439] mean value: 0.13760440349578856 key: score_time value: [0.01817727 0.01823282 0.01932836 0.01829553 0.01836991 0.01836514 0.01826692 0.01823115 0.01880527 0.01826 ] mean value: 0.018433237075805665 key: test_mcc value: [0.90841751 0.94509941 1. 0.98181211 0.98181818 1. 0.94509941 0.96329966 1. 0.96393713] mean value: 0.9689483423563136 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95412844 0.97247706 1. 0.99082569 0.99082569 1. 0.97247706 0.98165138 1. 0.98165138] mean value: 0.9844036697247707 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95412844 0.97196262 1. 0.99065421 0.99082569 1. 0.97297297 0.98181818 1. 0.98214286] mean value: 0.9844504962804286 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94545455 0.98113208 1. 1. 0.98181818 1. 0.96428571 0.98181818 1. 0.96491228] mean value: 0.9819420979550075 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 0.96296296 1. 0.98148148 1. 1. 0.98181818 0.98181818 1. 1. ] mean value: 0.987104377104377 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95420875 0.97239057 1. 0.99074074 0.99090909 1. 0.97239057 0.98164983 1. 0.98148148] mean value: 0.9843771043771044 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.9122807 0.94545455 1. 0.98148148 0.98181818 1. 0.94736842 0.96428571 1. 0.96491228] mean value: 0.9697601326548695 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.23 Accuracy on Blind test: 0.56 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01190448 0.01207089 0.01192713 0.0119226 0.0119257 0.01208067 0.01233387 0.01218677 0.01201677 0.01214457] mean value: 0.01205134391784668 key: score_time value: [0.00913978 0.00910997 0.00899887 0.00900722 0.00899005 0.00908947 0.00914955 0.00911641 0.00894523 0.00904179] mean value: 0.009058833122253418 key: test_mcc value: [0.87171717 0.90838671 0.94511785 0.92724828 0.90841751 0.87167401 0.90838671 0.90841751 0.87171717 0.92719967] mean value: 0.9048282595925088 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93577982 0.95412844 0.97247706 0.96330275 0.95412844 0.93577982 0.95412844 0.95412844 0.93577982 0.96330275] mean value: 0.9522935779816514 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93577982 0.95327103 0.97247706 0.96363636 0.95412844 0.93693694 0.95495495 0.95412844 0.93577982 0.96428571] mean value: 0.9525378575833003 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92727273 0.96226415 0.96363636 0.94642857 0.94545455 0.92857143 0.94642857 0.96296296 0.94444444 0.94736842] mean value: 0.9474832187195643 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.94444444 0.94444444 0.98148148 0.98148148 0.96296296 0.94545455 0.96363636 0.94545455 0.92727273 0.98181818] mean value: 0.9578451178451178 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93585859 0.9540404 0.97255892 0.96346801 0.95420875 0.93569024 0.9540404 0.95420875 0.93585859 0.96313131] mean value: 0.9523063973063972 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.87931034 0.91071429 0.94642857 0.92982456 0.9122807 0.88135593 0.9137931 0.9122807 0.87931034 0.93103448] mean value: 0.9096333030120597 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.28 Accuracy on Blind test: 0.6 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [3.192981 3.21978426 3.26361322 3.27795982 3.19749761 3.2129035 3.14740181 3.20146179 3.23462439 3.21413517] mean value: 3.2162362575531005 key: score_time value: [0.09524465 0.09526968 0.09523535 0.09465885 0.09512901 0.09531832 0.09523821 0.09471774 0.09536004 0.09545851] mean value: 0.09516303539276123 key: test_mcc value: [0.96329966 1. 0.98181211 0.98181211 0.98181818 1. 0.94511785 0.98181211 1. 0.96393713] mean value: 0.9799609160184436 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98165138 1. 0.99082569 0.99082569 0.99082569 1. 0.97247706 0.99082569 1. 0.98165138] mean value: 0.989908256880734 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98148148 1. 0.99065421 0.99065421 0.99082569 1. 0.97247706 0.99099099 1. 0.98214286] mean value: 0.989922649312386 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.98148148 1. 1. 1. 0.98181818 1. 0.98148148 0.98214286 1. 0.96491228] mean value: 0.9891836282625757 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.98148148 1. 0.98148148 0.98148148 1. 1. 0.96363636 1. 1. 1. ] mean value: 0.9908080808080808 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98164983 1. 0.99074074 0.99074074 0.99090909 1. 0.97255892 0.99074074 1. 0.98148148] mean value: 0.9898821548821548 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96363636 1. 0.98148148 0.98148148 0.98181818 1. 0.94642857 0.98214286 1. 0.96491228] mean value: 0.9801901217690692 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.25 Accuracy on Blind test: 0.57 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.15664482 1.25536323 1.23685193 1.19847178 1.22918534 1.21921778 1.28991628 1.25609398 1.18945456 1.21153784] mean value: 1.2242737531661987 key: score_time value: [0.28573251 0.21750188 0.26001978 0.21304607 0.22797418 0.21436405 0.26951408 0.28902078 0.24877381 0.24985695] mean value: 0.2475804090499878 key: test_mcc value: [0.94511785 1. 0.98181211 0.98181211 0.92724828 0.98181211 0.90838671 0.98181211 0.96393713 0.96393713] mean value: 0.9635875555795308 key: train_mcc value: [0.98980835 0.98573085 0.98573085 0.98777573 0.98776757 0.98573091 0.98777583 0.98980835 0.98573091 0.98776757] mean value: 0.9873626926348411 key: test_accuracy value: [0.97247706 1. 0.99082569 0.99082569 0.96330275 0.99082569 0.95412844 0.99082569 0.98165138 0.98165138] mean value: 0.981651376146789 key: train_accuracy value: [0.99490316 0.99286442 0.99286442 0.99388379 0.99388379 0.99286442 0.99388379 0.99490316 0.99286442 0.99388379] mean value: 0.9936799184505607 key: test_fscore value: [0.97247706 1. 0.99065421 0.99065421 0.96363636 0.99099099 0.95495495 0.99099099 0.98214286 0.98214286] mean value: 0.9818644490294152 key: train_fscore value: [0.99491353 0.99287894 0.99287894 0.99390244 0.99389002 0.99286442 0.99389002 0.99489275 0.99286442 0.99387755] mean value: 0.993685304063256 key: test_precision value: [0.96363636 1. 1. 1. 0.94642857 0.98214286 0.94642857 0.98214286 0.96491228 0.96491228] mean value: 0.975060378218273 key: train_precision value: [0.99390244 0.99186992 0.99186992 0.99188641 0.99389002 0.99185336 0.99186992 0.99591002 0.99185336 0.99387755] mean value: 0.992878291767276 key: test_recall value: [0.98148148 1. 0.98148148 0.98148148 0.98148148 1. 0.96363636 1. 1. 1. ] mean value: 0.9889562289562289 key: train_recall value: [0.99592668 0.99389002 0.99389002 0.99592668 0.99389002 0.99387755 0.99591837 0.99387755 0.99387755 0.99387755] mean value: 0.9944951993017166 key: test_roc_auc value: [0.97255892 1. 0.99074074 0.99074074 0.96346801 0.99074074 0.9540404 0.99074074 0.98148148 0.98148148] mean value: 0.9815993265993266 key: train_roc_auc value: [0.99490212 0.99286338 0.99286338 0.99388171 0.99388379 0.99286546 0.99388586 0.99490212 0.99286546 0.99388379] mean value: 0.9936797040608504 key: test_jcc value: [0.94642857 1. 0.98148148 0.98148148 0.92982456 0.98214286 0.9137931 0.98214286 0.96491228 0.96491228] mean value: 0.9647119474932542 key: train_jcc value: [0.98987854 0.98585859 0.98585859 0.98787879 0.98785425 0.98582996 0.98785425 0.9898374 0.98582996 0.98782961] mean value: 0.9874509936137159 MCC on Blind test: 0.3 Accuracy on Blind test: 0.59 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02857256 0.01666284 0.01697707 0.01707339 0.01707959 0.01700878 0.01709127 0.01708722 0.01719165 0.01719737] mean value: 0.018194174766540526 key: score_time value: [0.01237464 0.01241755 0.01254058 0.01241088 0.01241207 0.01243305 0.01244402 0.01243639 0.01238298 0.01241779] mean value: 0.012426996231079101 key: test_mcc value: [0.74351235 0.85541029 0.80136396 0.81882759 0.78478168 0.88989899 0.85319865 0.81649832 0.83501684 0.76161616] mean value: 0.8160124823784499 key: train_mcc value: [0.82670934 0.81447691 0.82263186 0.81249922 0.82076536 0.80838391 0.81246155 0.81448891 0.81246155 0.82059712] mean value: 0.816547573939551 key: test_accuracy value: [0.87155963 0.9266055 0.89908257 0.90825688 0.88990826 0.94495413 0.9266055 0.90825688 0.91743119 0.88073394] mean value: 0.9073394495412844 key: train_accuracy value: [0.91335372 0.90723751 0.91131498 0.90621814 0.91029562 0.90417941 0.90621814 0.90723751 0.90621814 0.91029562] mean value: 0.908256880733945 key: test_fscore value: [0.86792453 0.92307692 0.89320388 0.91071429 0.89473684 0.94545455 0.92727273 0.90909091 0.91743119 0.88073394] mean value: 0.9069639782126365 key: train_fscore value: [0.91335372 0.90723751 0.91131498 0.9057377 0.90946502 0.90368852 0.9057377 0.90685773 0.9057377 0.91002045] mean value: 0.9079151055700868 key: test_precision value: [0.88461538 0.96 0.93877551 0.87931034 0.85 0.94545455 0.92727273 0.90909091 0.92592593 0.88888889] mean value: 0.9109334236280049 key: train_precision value: [0.91428571 0.90816327 0.9122449 0.91134021 0.91891892 0.90740741 0.90946502 0.90965092 0.90946502 0.91188525] mean value: 0.9112826621141457 key: test_recall value: [0.85185185 0.88888889 0.85185185 0.94444444 0.94444444 0.94545455 0.92727273 0.90909091 0.90909091 0.87272727] mean value: 0.9045117845117845 key: train_recall value: [0.91242363 0.90631365 0.91038697 0.90020367 0.90020367 0.9 0.90204082 0.90408163 0.90204082 0.90816327] mean value: 0.9045858098840351 key: test_roc_auc value: [0.87138047 0.92626263 0.8986532 0.90858586 0.89040404 0.94494949 0.92659933 0.90824916 0.91750842 0.88080808] mean value: 0.9073400673400673 key: train_roc_auc value: [0.91335467 0.90723846 0.91131593 0.90622428 0.91030591 0.90417515 0.90621389 0.9072343 0.90621389 0.91029345] mean value: 0.9082569932249885 key: test_jcc value: [0.76666667 0.85714286 0.80701754 0.83606557 0.80952381 0.89655172 0.86440678 0.83333333 0.84745763 0.78688525] mean value: 0.8305051161116039 key: train_jcc value: [0.84052533 0.83022388 0.83707865 0.82771536 0.83396226 0.82429907 0.82771536 0.82958801 0.82771536 0.83489681] mean value: 0.8313720083087689 MCC on Blind test: 0.24 Accuracy on Blind test: 0.6 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.13928676 0.13060784 0.21524143 0.14963627 0.15037847 0.1270678 0.13197303 0.13237286 0.12784719 0.12808561] mean value: 0.1432497262954712 key: score_time value: [0.01138091 0.01127362 0.01427794 0.01234388 0.01148152 0.01167202 0.01126146 0.01150727 0.01140141 0.011204 ] mean value: 0.011780405044555664 key: test_mcc value: [0.90841751 1. 0.96393713 0.98181211 0.96396098 0.96329966 0.94511785 0.96329966 0.98181211 0.96393713] mean value: 0.9635594149827394 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95412844 1. 0.98165138 0.99082569 0.98165138 0.98165138 0.97247706 0.98165138 0.99082569 0.98165138] mean value: 0.981651376146789 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95412844 1. 0.98113208 0.99065421 0.98181818 0.98181818 0.97247706 0.98181818 0.99099099 0.98214286] mean value: 0.9816980179254724 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94545455 1. 1. 1. 0.96428571 0.98181818 0.98148148 0.98181818 0.98214286 0.96491228] mean value: 0.9801913242702717 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 1. 0.96296296 0.98148148 1. 0.98181818 0.96363636 0.98181818 1. 1. ] mean value: 0.9834680134680135 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95420875 1. 0.98148148 0.99074074 0.98181818 0.98164983 0.97255892 0.98164983 0.99074074 0.98148148] mean value: 0.9816329966329966 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.9122807 1. 0.96296296 0.98148148 0.96428571 0.96428571 0.94642857 0.96428571 0.98214286 0.96491228] mean value: 0.9643065998329157 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.27 Accuracy on Blind test: 0.59 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05295801 0.08014059 0.09963298 0.083951 0.06942058 0.08701682 0.0714407 0.08109713 0.0660603 0.09231591] mean value: 0.0784034013748169 key: score_time value: [0.01235604 0.01253247 0.03159952 0.02604461 0.01255369 0.02007866 0.01244569 0.01251531 0.0123868 0.02083731] mean value: 0.017335009574890137 key: test_mcc value: [0.946411 0.89544301 0.946411 0.98181818 0.98181818 0.96393713 0.89223482 0.846254 0.96393713 0.91202529] mean value: 0.9330289738014781 key: train_mcc value: [0.9837635 0.97773807 0.9857799 0.97974261 0.9837635 0.9857802 0.97981668 0.9817516 0.9837639 0.9817516 ] mean value: 0.9823651560326023 key: test_accuracy value: [0.97247706 0.94495413 0.97247706 0.99082569 0.99082569 0.98165138 0.94495413 0.91743119 0.98165138 0.95412844] mean value: 0.9651376146788991 key: train_accuracy value: [0.99184506 0.98878695 0.99286442 0.98980632 0.99184506 0.99286442 0.98980632 0.99082569 0.99184506 0.99082569] mean value: 0.9911314984709481 key: test_fscore value: [0.97297297 0.94736842 0.97297297 0.99082569 0.99082569 0.98214286 0.94736842 0.92436975 0.98214286 0.95652174] mean value: 0.9667511365513307 key: train_fscore value: [0.99190283 0.9889001 0.9929078 0.98989899 0.99190283 0.9928934 0.98989899 0.99088146 0.99188641 0.99088146] mean value: 0.9911954278825454 key: test_precision value: [0.94736842 0.9 0.94736842 0.98181818 0.98181818 0.96491228 0.91525424 0.859375 0.96491228 0.91666667] mean value: 0.9379493671099938 key: train_precision value: [0.98591549 0.98 0.98790323 0.98196393 0.98591549 0.98787879 0.98 0.98390342 0.9858871 0.98390342] mean value: 0.9843270865276915 key: test_recall value: [1. 1. 1. 1. 1. 1. 0.98181818 1. 1. 1. ] mean value: 0.9981818181818182 key: train_recall value: [0.99796334 0.99796334 0.99796334 0.99796334 0.99796334 0.99795918 1. 0.99795918 0.99795918 0.99795918] mean value: 0.9981653435304876 key: test_roc_auc value: [0.97272727 0.94545455 0.97272727 0.99090909 0.99090909 0.98148148 0.94461279 0.91666667 0.98148148 0.9537037 ] mean value: 0.9650673400673401 key: train_roc_auc value: [0.99183881 0.98877759 0.99285922 0.989798 0.99183881 0.99286961 0.9898167 0.99083295 0.99185128 0.99083295] mean value: 0.9911315931667983 key: test_jcc value: [0.94736842 0.9 0.94736842 0.98181818 0.98181818 0.96491228 0.9 0.859375 0.96491228 0.91666667] mean value: 0.9364239433811802 key: train_jcc value: [0.98393574 0.97804391 0.98591549 0.98 0.98393574 0.9858871 0.98 0.98192771 0.98390342 0.98192771] mean value: 0.9825476830061249 MCC on Blind test: 0.21 Accuracy on Blind test: 0.57 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02079773 0.01609492 0.01598954 0.01628113 0.0189333 0.01634192 0.01601601 0.01622939 0.01595616 0.01598763] mean value: 0.016862773895263673 key: score_time value: [0.01259875 0.01211143 0.01218486 0.0121243 0.01218343 0.01213431 0.01215887 0.01219344 0.01213384 0.01207805] mean value: 0.012190127372741699 key: test_mcc value: [0.76248469 0.78039748 0.83496131 0.76161616 0.71100747 0.79824861 0.74368478 0.78039748 0.79824861 0.78039748] mean value: 0.7751444102497523 key: train_mcc value: [0.77984039 0.76966498 0.76962372 0.77777963 0.79002606 0.77574619 0.77370215 0.76962372 0.77373792 0.77371595] mean value: 0.7753460692713874 key: test_accuracy value: [0.88073394 0.88990826 0.91743119 0.88073394 0.85321101 0.89908257 0.87155963 0.88990826 0.89908257 0.88990826] mean value: 0.8871559633027524 key: train_accuracy value: [0.88990826 0.88481142 0.88481142 0.88888889 0.8950051 0.88786952 0.88685015 0.88481142 0.88685015 0.88685015] mean value: 0.8876656472986748 key: test_fscore value: [0.87619048 0.89090909 0.91588785 0.88073394 0.85964912 0.9009009 0.87037037 0.88888889 0.9009009 0.88888889] mean value: 0.8873320435277953 key: train_fscore value: [0.89046653 0.88433982 0.88504578 0.88888889 0.8947906 0.88798371 0.88685015 0.8845761 0.88615385 0.88708037] mean value: 0.8876175787042375 key: test_precision value: [0.90196078 0.875 0.9245283 0.87272727 0.81666667 0.89285714 0.88679245 0.90566038 0.89285714 0.90566038] mean value: 0.8874710518855913 key: train_precision value: [0.88686869 0.88888889 0.88414634 0.88979592 0.89754098 0.88617886 0.88594705 0.88548057 0.89072165 0.88438134] mean value: 0.8879950288650756 key: test_recall value: [0.85185185 0.90740741 0.90740741 0.88888889 0.90740741 0.90909091 0.85454545 0.87272727 0.90909091 0.87272727] mean value: 0.8881144781144781 key: train_recall value: [0.89409369 0.87983707 0.88594705 0.88798371 0.89205703 0.88979592 0.8877551 0.88367347 0.88163265 0.88979592] mean value: 0.8872571594829378 key: test_roc_auc value: [0.88047138 0.89006734 0.91734007 0.88080808 0.8537037 0.8989899 0.87171717 0.89006734 0.8989899 0.89006734] mean value: 0.8872222222222222 key: train_roc_auc value: [0.88990399 0.88481649 0.88481026 0.88888981 0.89500811 0.88787148 0.88685107 0.88481026 0.88684484 0.88685315] mean value: 0.8876659462155534 key: test_jcc value: [0.77966102 0.80327869 0.84482759 0.78688525 0.75384615 0.81967213 0.7704918 0.8 0.81967213 0.8 ] mean value: 0.7978334757002203 key: train_jcc value: [0.80255941 0.79266055 0.79379562 0.8 0.80961183 0.7985348 0.7967033 0.79304029 0.79558011 0.79707495] mean value: 0.7979560868903866 MCC on Blind test: 0.37 Accuracy on Blind test: 0.66 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03137422 0.03929567 0.02904129 0.03385782 0.03152227 0.0290134 0.02575612 0.02675271 0.035393 0.02952981] mean value: 0.03115363121032715 key: score_time value: [0.01219153 0.01214862 0.01207137 0.01212645 0.01213336 0.01214814 0.01212502 0.01226616 0.0121429 0.01237202] mean value: 0.012172555923461914 key: test_mcc value: [0.90841751 0.92724828 0.96393713 0.96393713 0.946411 0.96393713 0.92905936 0.98181211 1. 0.96393713] mean value: 0.9548696778106656 key: train_mcc value: [0.99390234 0.9778193 0.9837639 0.9406024 0.99187796 0.98166983 0.98170255 0.98980839 0.95806676 0.9837635 ] mean value: 0.978297693597624 key: test_accuracy value: [0.95412844 0.96330275 0.98165138 0.98165138 0.97247706 0.98165138 0.96330275 0.99082569 1. 0.98165138] mean value: 0.9770642201834863 key: train_accuracy value: [0.9969419 0.98878695 0.99184506 0.96941896 0.99592253 0.99082569 0.99082569 0.99490316 0.97859327 0.99184506] mean value: 0.9889908256880734 key: test_fscore value: [0.95412844 0.96363636 0.98113208 0.98113208 0.97297297 0.98214286 0.96491228 0.99099099 1. 0.98214286] mean value: 0.9773190913898165 key: train_fscore value: [0.99695431 0.98892246 0.99180328 0.96848739 0.9959432 0.99084435 0.99086294 0.99490316 0.97902098 0.99178645] mean value: 0.9889528535316982 key: test_precision value: [0.94545455 0.94642857 1. 1. 0.94736842 0.96491228 0.93220339 0.98214286 1. 0.96491228] mean value: 0.9683422346312622 key: train_precision value: [0.99392713 0.97808765 0.99793814 1. 0.99191919 0.98782961 0.98585859 0.99389002 0.95890411 0.99793388] mean value: 0.9886288325873761 key: test_recall value: [0.96296296 0.98148148 0.96296296 0.96296296 1. 1. 1. 1. 1. 1. ] mean value: 0.987037037037037 key: train_recall value: [1. 1. 0.98574338 0.9389002 1. 0.99387755 0.99591837 0.99591837 1. 0.98571429] mean value: 0.9896072155949956 key: test_roc_auc value: [0.95420875 0.96346801 0.98148148 0.98148148 0.97272727 0.98148148 0.96296296 0.99074074 1. 0.98148148] mean value: 0.977003367003367 key: train_roc_auc value: [0.99693878 0.98877551 0.99185128 0.9694501 0.99591837 0.9908288 0.99083087 0.99490419 0.97861507 0.99183881] mean value: 0.9889951785194729 key: test_jcc value: [0.9122807 0.92982456 0.96296296 0.96296296 0.94736842 0.96491228 0.93220339 0.98214286 1. 0.96491228] mean value: 0.9559570418513327 key: train_jcc value: [0.99392713 0.97808765 0.98373984 0.9389002 0.99191919 0.98185484 0.98189135 0.98985801 0.95890411 0.98370672] mean value: 0.9782789037427249 MCC on Blind test: 0.23 Accuracy on Blind test: 0.57 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02381825 0.02869153 0.02777767 0.02622557 0.02247834 0.02444983 0.0302763 0.02613544 0.02430272 0.02424407] mean value: 0.025839972496032714 key: score_time value: [0.01220822 0.01211882 0.01219845 0.0121603 0.01212859 0.01205921 0.01216602 0.01215434 0.01211047 0.01222229] mean value: 0.012152671813964844 key: test_mcc value: [0.7935982 0.92719967 0.8789643 0.98181818 0.90838671 0.96393713 0.92905936 0.96329966 1. 0.96393713] mean value: 0.9310200334445711 key: train_mcc value: [0.81490143 0.9677754 0.94445227 0.9836901 0.94386773 0.96789632 0.99185333 0.98372267 0.9367601 0.98175107] mean value: 0.951667042747277 key: test_accuracy value: [0.88990826 0.96330275 0.93577982 0.99082569 0.95412844 0.98165138 0.96330275 0.98165138 1. 0.98165138] mean value: 0.9642201834862385 key: train_accuracy value: [0.89908257 0.98369011 0.9714577 0.99184506 0.9714577 0.98369011 0.99592253 0.99184506 0.96738022 0.99082569] mean value: 0.9747196738022426 key: test_fscore value: [0.89830508 0.96226415 0.93913043 0.99082569 0.95327103 0.98214286 0.96491228 0.98181818 1. 0.98214286] mean value: 0.9654812563388195 key: train_fscore value: [0.90841813 0.98347107 0.97227723 0.99185336 0.97083333 0.98393574 0.99592668 0.99180328 0.96837945 0.99075026] mean value: 0.9757648532767356 key: test_precision value: [0.828125 0.98076923 0.8852459 0.98181818 0.96226415 0.96491228 0.93220339 0.98181818 1. 0.96491228] mean value: 0.9482068598222352 key: train_precision value: [0.83220339 0.99790356 0.9460501 0.99185336 0.99360341 0.96837945 0.99390244 0.99588477 0.93869732 0.99792961] mean value: 0.9656407406073759 key: test_recall value: [0.98148148 0.94444444 1. 1. 0.94444444 1. 1. 0.98181818 1. 1. ] mean value: 0.9852188552188552 key: train_recall value: [1. 0.9694501 1. 0.99185336 0.9490835 1. 0.99795918 0.9877551 1. 0.98367347] mean value: 0.9879774720478822 key: test_roc_auc value: [0.89074074 0.96313131 0.93636364 0.99090909 0.9540404 0.98148148 0.96296296 0.98164983 1. 0.98148148] mean value: 0.9642760942760943 key: train_roc_auc value: [0.89897959 0.98370464 0.97142857 0.99184505 0.97148053 0.98370672 0.9959246 0.99184089 0.96741344 0.9908184 ] mean value: 0.9747142441497983 key: test_jcc value: [0.81538462 0.92727273 0.8852459 0.98181818 0.91071429 0.96491228 0.93220339 0.96428571 1. 0.96491228] mean value: 0.9346749377348886 key: train_jcc value: [0.83220339 0.96747967 0.9460501 0.98383838 0.94331984 0.96837945 0.99188641 0.98373984 0.93869732 0.98167006] mean value: 0.9537264455743892 MCC on Blind test: 0.33 Accuracy on Blind test: 0.62 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.51863694 0.50481296 0.51417375 0.50742078 0.510185 0.50463581 0.50489211 0.50328565 0.51146102 0.50386643] mean value: 0.5083370447158814 key: score_time value: [0.01617646 0.01697731 0.01649475 0.01617336 0.01631141 0.0161674 0.01612544 0.01631689 0.01627827 0.01620293] mean value: 0.01632242202758789 key: test_mcc value: [0.89053558 0.98181211 0.98181211 0.98181211 0.98181818 0.98181211 0.94509941 0.98181211 1. 0.98181211] mean value: 0.9708325861574671 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94495413 0.99082569 0.99082569 0.99082569 0.99082569 0.99082569 0.97247706 0.99082569 1. 0.99082569] mean value: 0.9853211009174312 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94545455 0.99065421 0.99065421 0.99065421 0.99082569 0.99099099 0.97297297 0.99099099 1. 0.99099099] mean value: 0.9854188796296316 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92857143 1. 1. 1. 0.98181818 0.98214286 0.96428571 0.98214286 1. 0.98214286] mean value: 0.9821103896103895 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 0.98148148 0.98148148 0.98148148 1. 1. 0.98181818 1. 1. 1. ] mean value: 0.988922558922559 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94511785 0.99074074 0.99074074 0.99074074 0.99090909 0.99074074 0.97239057 0.99074074 1. 0.99074074] mean value: 0.9852861952861953 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.89655172 0.98148148 0.98148148 0.98148148 0.98181818 0.98214286 0.94736842 0.98214286 1. 0.98214286] mean value: 0.971661134288176 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.3 Accuracy on Blind test: 0.6 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.21921444 0.22345185 0.21574569 0.2206254 0.22202802 0.13987017 0.22048545 0.23433447 0.23488665 0.21739197] mean value: 0.21480340957641603 key: score_time value: [0.03116179 0.02156115 0.04347515 0.02509618 0.03190017 0.02411628 0.04481173 0.02482653 0.038275 0.03011703] mean value: 0.03153409957885742 key: test_mcc value: [0.92659933 0.98181211 0.94509941 0.98181211 0.96396098 0.94511785 0.94511785 0.94511785 0.98181211 0.98181211] mean value: 0.9598261714386186 key: train_mcc value: [0.9918781 0.99390241 0.99388586 0.99592252 0.99390241 0.99185326 0.99592252 0.99187796 0.99796333 0.99388584] mean value: 0.9940994227573303 key: test_accuracy value: [0.96330275 0.99082569 0.97247706 0.99082569 0.98165138 0.97247706 0.97247706 0.97247706 0.99082569 0.99082569] mean value: 0.9798165137614679 key: train_accuracy value: [0.99592253 0.9969419 0.9969419 0.99796126 0.9969419 0.99592253 0.99796126 0.99592253 0.99898063 0.9969419 ] mean value: 0.9970438328236494 key: test_fscore value: [0.96296296 0.99065421 0.97196262 0.99065421 0.98181818 0.97247706 0.97247706 0.97247706 0.99099099 0.99099099] mean value: 0.9797465347461061 key: train_fscore value: [0.99591002 0.99693565 0.9969419 0.99796334 0.99693565 0.99591002 0.99795918 0.99590164 0.99897855 0.99693565] mean value: 0.9970371595467664 key: test_precision value: [0.96296296 1. 0.98113208 1. 0.96428571 0.98148148 0.98148148 0.98148148 0.98214286 0.98214286] mean value: 0.9817110911450534 key: train_precision value: [1. 1. 0.99795918 0.99796334 1. 0.99795082 0.99795918 1. 1. 0.99795501] mean value: 0.9989787537366218 key: test_recall value: [0.96296296 0.98148148 0.96296296 0.98148148 1. 0.96363636 0.96363636 0.96363636 1. 1. ] mean value: 0.977979797979798 key: train_recall value: [0.99185336 0.99389002 0.99592668 0.99796334 0.99389002 0.99387755 0.99795918 0.99183673 0.99795918 0.99591837] mean value: 0.9951074441996758 key: test_roc_auc value: [0.96329966 0.99074074 0.97239057 0.99074074 0.98181818 0.97255892 0.97255892 0.97255892 0.99074074 0.99074074] mean value: 0.9798148148148148 key: train_roc_auc value: [0.99592668 0.99694501 0.99694293 0.99796126 0.99694501 0.99592045 0.99796126 0.99591837 0.99897959 0.99694085] mean value: 0.9970441414855148 key: test_jcc value: [0.92857143 0.98148148 0.94545455 0.98148148 0.96428571 0.94642857 0.94642857 0.94642857 0.98214286 0.98214286] mean value: 0.960484607984608 key: train_jcc value: [0.99185336 0.99389002 0.99390244 0.99593496 0.99389002 0.99185336 0.99592668 0.99183673 0.99795918 0.99389002] mean value: 0.9940936779063123 MCC on Blind test: 0.21 Accuracy on Blind test: 0.56 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.45224738 0.63778901 0.5324204 0.5638485 0.78409576 0.54516888 0.51332784 0.58729959 0.45964193 0.45682168] mean value: 0.5532660961151123 key: score_time value: [0.02797675 0.04070687 0.04118657 0.0417459 0.0412612 0.02333283 0.02900791 0.04153872 0.02351713 0.03968668] mean value: 0.03499605655670166 key: test_mcc value: [0.87171717 0.90841751 0.98181818 0.96329966 0.96396098 0.96393713 0.92719967 0.92719967 0.96329966 0.96393713] mean value: 0.9434786761144899 key: train_mcc value: [0.99593079 1. 0.99593079 0.99593079 0.99796333 0.99593082 0.99593082 0.99796334 0.99593082 0.99593082] mean value: 0.9967442309031642 key: test_accuracy value: [0.93577982 0.95412844 0.99082569 0.98165138 0.98165138 0.98165138 0.96330275 0.96330275 0.98165138 0.98165138] mean value: 0.9715596330275229 key: train_accuracy value: [0.99796126 1. 0.99796126 0.99796126 0.99898063 0.99796126 0.99796126 0.99898063 0.99796126 0.99796126] mean value: 0.9983690112130479 key: test_fscore value: [0.93577982 0.95412844 0.99082569 0.98148148 0.98181818 0.98214286 0.96428571 0.96428571 0.98181818 0.98214286] mean value: 0.9718708932929117 key: train_fscore value: [0.99796748 1. 0.99796748 0.99796748 0.99898271 0.99796334 0.99796334 0.99898063 0.99796334 0.99796334] mean value: 0.9983719137523378 key: test_precision value: [0.92727273 0.94545455 0.98181818 0.98148148 0.96428571 0.96491228 0.94736842 0.94736842 0.98181818 0.96491228] mean value: 0.9606692235639605 key: train_precision value: [0.9959432 1. 0.9959432 0.9959432 0.99796748 0.99593496 0.99593496 0.99796334 0.99593496 0.99593496] mean value: 0.9967500271799833 key: test_recall value: [0.94444444 0.96296296 1. 0.98148148 1. 1. 0.98181818 0.98181818 0.98181818 1. ] mean value: 0.9834343434343434 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93585859 0.95420875 0.99090909 0.98164983 0.98181818 0.98148148 0.96313131 0.96313131 0.98164983 0.98148148] mean value: 0.9715319865319865 key: train_roc_auc value: [0.99795918 1. 0.99795918 0.99795918 0.99897959 0.99796334 0.99796334 0.99898167 0.99796334 0.99796334] mean value: 0.9983692173407042 key: test_jcc value: [0.87931034 0.9122807 0.98181818 0.96363636 0.96428571 0.96491228 0.93103448 0.93103448 0.96428571 0.96491228] mean value: 0.9457510547528696 key: train_jcc value: [0.9959432 1. 0.9959432 0.9959432 0.99796748 0.99593496 0.99593496 0.99796334 0.99593496 0.99593496] mean value: 0.9967500271799833 MCC on Blind test: 0.28 Accuracy on Blind test: 0.59 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [2.25848699 2.24054694 2.27098012 2.30772567 2.28769517 2.29379869 2.27469635 2.2514286 2.25546622 2.24115229] mean value: 2.268197703361511 key: score_time value: [0.00967145 0.00950313 0.00958705 0.01027203 0.00979424 0.01037478 0.00971627 0.00944853 0.00973773 0.00967526] mean value: 0.009778046607971191 key: test_mcc value: [0.92659933 1. 0.96393713 0.98181211 0.96396098 0.94511785 0.94511785 0.94511785 0.98181211 0.94635821] mean value: 0.9599833416841214 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96330275 1. 0.98165138 0.99082569 0.98165138 0.97247706 0.97247706 0.97247706 0.99082569 0.97247706] mean value: 0.9798165137614679 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 1. 0.98113208 0.99065421 0.98181818 0.97247706 0.97247706 0.97247706 0.99099099 0.97345133] mean value: 0.979844093694549 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 1. 1. 1. 0.96428571 0.98148148 0.98148148 0.98148148 0.98214286 0.94827586] mean value: 0.9802111840904945 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 1. 0.96296296 0.98148148 1. 0.96363636 0.96363636 0.96363636 1. 1. ] mean value: 0.9798316498316498 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96329966 1. 0.98148148 0.99074074 0.98181818 0.97255892 0.97255892 0.97255892 0.99074074 0.97222222] mean value: 0.9797979797979798 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 1. 0.96296296 0.98148148 0.96428571 0.94642857 0.94642857 0.94642857 0.98214286 0.94827586] mean value: 0.9607006020799124 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.28 Accuracy on Blind test: 0.59 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.04199314 0.04448581 0.05012202 0.05785346 0.04534864 0.04628587 0.04821253 0.04629254 0.04560542 0.06620932] mean value: 0.04924087524414063 key: score_time value: [0.02291536 0.0129447 0.01291752 0.02138734 0.01435804 0.01392055 0.02247977 0.01402569 0.01400256 0.02144146] mean value: 0.01703929901123047 key: test_mcc value: [0.98181211 1. 1. 1. 1. 1. 1. 1. 0.98181818 1. ] mean value: 0.9963630295440247 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.99082569 1. 1. 1. 1. 1. 1. 1. 0.99082569 1. ] mean value: 0.998165137614679 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.99065421 1. 1. 1. 1. 1. 1. 1. 0.99082569 1. ] mean value: 0.9981479893680871 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.98148148 1. 1. 1. 1. 1. 1. 1. 0.98181818 1. ] mean value: 0.9963299663299663 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.99074074 1. 1. 1. 1. 1. 1. 1. 0.99090909 1. ] mean value: 0.9981649831649831 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.98148148 1. 1. 1. 1. 1. 1. 1. 0.98181818 1. ] mean value: 0.9963299663299663 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.0 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.035707 0.04087877 0.04267907 0.0428102 0.0426352 0.04230046 0.04302907 0.04247069 0.04260826 0.0430162 ] mean value: 0.04181349277496338 key: score_time value: [0.01995635 0.01925468 0.01918221 0.01924872 0.01916957 0.01920724 0.01924849 0.01924205 0.01930523 0.01914072] mean value: 0.019295525550842286 key: test_mcc value: [0.946411 0.90967354 0.98181818 0.98181818 0.96396098 1. 0.87869377 0.96393713 1. 0.91202529] mean value: 0.9538338072232938 key: train_mcc value: [0.98181631 0.9778193 0.9737407 0.9738378 0.9738378 0.97383919 0.9778203 0.97383919 0.97383919 0.97981668] mean value: 0.9760206475633355 key: test_accuracy value: [0.97247706 0.95412844 0.99082569 0.99082569 0.98165138 1. 0.93577982 0.98165138 1. 0.95412844] mean value: 0.9761467889908257 key: train_accuracy value: [0.99082569 0.98878695 0.98674822 0.98674822 0.98674822 0.98674822 0.98878695 0.98674822 0.98674822 0.98980632] mean value: 0.9878695208970438 key: test_fscore value: [0.97297297 0.95495495 0.99082569 0.99082569 0.98181818 1. 0.94017094 0.98214286 1. 0.95652174] mean value: 0.9770233022337131 key: train_fscore value: [0.99091826 0.98892246 0.98690836 0.98693467 0.98693467 0.98690836 0.9889001 0.98690836 0.98690836 0.98989899] mean value: 0.9880142593158917 key: test_precision value: [0.94736842 0.92982456 0.98181818 0.98181818 0.96428571 1. 0.88709677 0.96491228 1. 0.91666667] mean value: 0.9573790781940188 key: train_precision value: [0.982 0.97808765 0.97609562 0.97420635 0.97420635 0.97415507 0.97804391 0.97415507 0.97415507 0.98 ] mean value: 0.9765105086268133 key: test_recall value: [1. 0.98148148 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9981481481481481 key: train_recall value: [1. 1. 0.99796334 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.99979633401222 key: test_roc_auc value: [0.97272727 0.9543771 0.99090909 0.99090909 0.98181818 1. 0.93518519 0.98148148 1. 0.9537037 ] mean value: 0.976111111111111 key: train_roc_auc value: [0.99081633 0.98877551 0.98673677 0.98673469 0.98673469 0.98676171 0.98879837 0.98676171 0.98676171 0.9898167 ] mean value: 0.9878698200257701 key: test_jcc value: [0.94736842 0.9137931 0.98181818 0.98181818 0.96428571 1. 0.88709677 0.96491228 1. 0.91666667] mean value: 0.9557759323984955 key: train_jcc value: [0.982 0.97808765 0.97415507 0.97420635 0.97420635 0.97415507 0.97804391 0.97415507 0.97415507 0.98 ] mean value: 0.9763164538320758 MCC on Blind test: 0.28 Accuracy on Blind test: 0.59 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.26908207 0.36156917 0.38551164 0.42597938 0.36128283 0.35268307 0.35298371 0.35420609 0.44186163 0.36942291] mean value: 0.36745824813842776 key: score_time value: [0.02261543 0.01918912 0.02010345 0.01927495 0.01930094 0.02481413 0.01924181 0.01960397 0.02101731 0.01992035] mean value: 0.02050814628601074 key: test_mcc value: [0.88989899 0.92724828 0.98181818 0.98181818 0.96396098 1. 0.8582788 0.98181211 0.96396098 0.91202529] mean value: 0.9460821804669207 key: train_mcc value: [0.96766902 0.96359022 0.9737407 0.9738378 0.95942381 0.95961736 0.9778203 0.95742826 0.957524 0.96758197] mean value: 0.9658233448957639 key: test_accuracy value: [0.94495413 0.96330275 0.99082569 0.99082569 0.98165138 1. 0.9266055 0.99082569 0.98165138 0.95412844] mean value: 0.9724770642201835 key: train_accuracy value: [0.98369011 0.98165138 0.98674822 0.98674822 0.97961264 0.97961264 0.98878695 0.97859327 0.97859327 0.98369011] mean value: 0.9827726809378186 key: test_fscore value: [0.94444444 0.96363636 0.99082569 0.99082569 0.98181818 1. 0.93103448 0.99099099 0.98148148 0.95652174] mean value: 0.9731579060407307 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:135: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:138: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.98390342 0.98189135 0.98690836 0.98693467 0.97983871 0.97987928 0.9889001 0.97880928 0.97885196 0.98383838] mean value: 0.9829755517864163 key: test_precision value: [0.94444444 0.94642857 0.98181818 0.98181818 0.96428571 1. 0.8852459 0.98214286 1. 0.91666667] mean value: 0.9602850519243962 key: train_precision value: [0.972167 0.97017893 0.97609562 0.97420635 0.97005988 0.96626984 0.97804391 0.96806387 0.96620278 0.974 ] mean value: 0.9715288180430208 key: test_recall value: [0.94444444 0.98148148 1. 1. 1. 1. 0.98181818 1. 0.96363636 1. ] mean value: 0.9871380471380471 key: train_recall value: [0.99592668 0.99389002 0.99796334 1. 0.9898167 0.99387755 1. 0.98979592 0.99183673 0.99387755] mean value: 0.9946984496446236 key: test_roc_auc value: [0.94494949 0.96346801 0.99090909 0.99090909 0.98181818 1. 0.92609428 0.99074074 0.98181818 0.9537037 ] mean value: 0.9724410774410774 key: train_roc_auc value: [0.98367763 0.98163889 0.98673677 0.98673469 0.97960223 0.97962717 0.98879837 0.97860468 0.97860676 0.98370049] mean value: 0.9827727669479197 key: test_jcc value: [0.89473684 0.92982456 0.98181818 0.98181818 0.96428571 1. 0.87096774 0.98214286 0.96363636 0.91666667] mean value: 0.9485897110812221 key: train_jcc value: [0.96831683 0.96442688 0.97415507 0.97420635 0.96047431 0.96055227 0.97804391 0.95849802 0.95857988 0.96819085] mean value: 0.9665444376905993 MCC on Blind test: 0.32 Accuracy on Blind test: 0.61 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04058647 0.04435062 0.04309654 0.04436064 0.04376888 0.0449791 0.04290581 0.04352117 0.04341054 0.04521823] mean value: 0.04361979961395264 key: score_time value: [0.01262784 0.01453471 0.01264024 0.01438498 0.01472068 0.01512718 0.01484346 0.01487494 0.0149014 0.01476884] mean value: 0.014342427253723145 key: test_mcc value: [0.946411 0.96396098 0.98181818 0.96396098 0.96396098 0.94635821 0.91202529 0.98181211 0.96393713 0.92905936] mean value: 0.9553304233697641 key: train_mcc value: [0.96987162 0.96987162 0.96592058 0.96592058 0.96987162 0.97185442 0.97383919 0.97383919 0.96395333 0.97582781] mean value: 0.9700769986001025 key: test_accuracy value: [0.97247706 0.98165138 0.99082569 0.98165138 0.98165138 0.97247706 0.95412844 0.99082569 0.98165138 0.96330275] mean value: 0.9770642201834863 key: train_accuracy value: [0.98470948 0.98470948 0.98267074 0.98267074 0.98470948 0.98572885 0.98674822 0.98674822 0.98165138 0.98776758] mean value: 0.9848114169215086 key: test_fscore value: [0.97297297 0.98181818 0.99082569 0.98181818 0.98181818 0.97345133 0.95652174 0.99099099 0.98214286 0.96491228] mean value: 0.9777272401900579 key: train_fscore value: [0.98495486 0.98495486 0.98298298 0.98298298 0.98495486 0.98591549 0.98690836 0.98690836 0.98196393 0.98790323] mean value: 0.9850429923386353 key: test_precision value: [0.94736842 0.96428571 0.98181818 0.96428571 0.96428571 0.94827586 0.91666667 0.98214286 0.96491228 0.93220339] mean value: 0.9566244802138708 key: train_precision value: [0.97035573 0.97035573 0.96653543 0.96653543 0.97035573 0.97222222 0.97415507 0.97415507 0.96456693 0.97609562] mean value: 0.9705332967868592 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97272727 0.98181818 0.99090909 0.98181818 0.98181818 0.97222222 0.9537037 0.99074074 0.98148148 0.96296296] mean value: 0.977020202020202 key: train_roc_auc value: [0.98469388 0.98469388 0.98265306 0.98265306 0.98469388 0.98574338 0.98676171 0.98676171 0.98167006 0.98778004] mean value: 0.9848104659379027 key: test_jcc value: [0.94736842 0.96428571 0.98181818 0.96428571 0.96428571 0.94827586 0.91666667 0.98214286 0.96491228 0.93220339] mean value: 0.9566244802138708 key: train_jcc value: [0.97035573 0.97035573 0.96653543 0.96653543 0.97035573 0.97222222 0.97415507 0.97415507 0.96456693 0.97609562] mean value: 0.9705332967868592 MCC on Blind test: 0.33 Accuracy on Blind test: 0.61 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.17332339 1.07144642 1.20221496 1.05029082 1.10251975 1.00360799 1.09597087 1.00588417 1.12927198 1.03671074] mean value: 1.0871241092681885 key: score_time value: [0.01915336 0.0194943 0.01916051 0.03661346 0.01905918 0.0230329 0.02343559 0.01981854 0.02063799 0.01904988] mean value: 0.02194557189941406 key: test_mcc value: [0.96396098 0.96396098 0.98181818 0.98181818 0.9291517 0.94635821 0.92905936 0.98181211 0.98181211 0.92905936] mean value: 0.9588811186918972 key: train_mcc value: [1. 0.99796333 1. 1. 1. 0.9778203 0.99796334 1. 1. 0.99796334] mean value: 0.9971710313107653 key: test_accuracy value: [0.98165138 0.98165138 0.99082569 0.99082569 0.96330275 0.97247706 0.96330275 0.99082569 0.99082569 0.96330275] mean value: 0.9788990825688073 key: train_accuracy value: [1. 0.99898063 1. 1. 1. 0.98878695 0.99898063 1. 1. 0.99898063] mean value: 0.9985728848114169 key: test_fscore value: [0.98181818 0.98181818 0.99082569 0.99082569 0.96428571 0.97345133 0.96491228 0.99099099 0.99099099 0.96491228] mean value: 0.9794831324887986 key: train_fscore value: [1. 0.99898271 1. 1. 1. 0.9889001 0.99898063 1. 1. 0.99898063] mean value: 0.9985844070926517 key: test_precision value: [0.96428571 0.96428571 0.98181818 0.98181818 0.93103448 0.94827586 0.93220339 0.98214286 0.98214286 0.93220339] mean value: 0.9600210630982109 key: train_precision value: [1. 0.99796748 1. 1. 1. 0.97804391 0.99796334 1. 1. 0.99796334] mean value: 0.9971938072094845 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98181818 0.98181818 0.99090909 0.99090909 0.96363636 0.97222222 0.96296296 0.99074074 0.99074074 0.96296296] mean value: 0.9788720538720539 key: train_roc_auc value: [1. 0.99897959 1. 1. 1. 0.98879837 0.99898167 1. 1. 0.99898167] mean value: 0.9985741302631032 key: test_jcc value: [0.96428571 0.96428571 0.98181818 0.98181818 0.93103448 0.94827586 0.93220339 0.98214286 0.98214286 0.93220339] mean value: 0.9600210630982109 key: train_jcc value: [1. 0.99796748 1. 1. 1. 0.97804391 0.99796334 1. 1. 0.99796334] mean value: 0.9971938072094845 MCC on Blind test: 0.21 Accuracy on Blind test: 0.56 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.03226662 0.01357055 0.01272416 0.01376796 0.01404357 0.01343679 0.01265812 0.01321292 0.01250315 0.01331568] mean value: 0.015149950981140137 key: score_time value: [0.01026297 0.01015615 0.01043391 0.00975466 0.01017189 0.00939393 0.009799 0.00966072 0.0104816 0.00953364] mean value: 0.009964847564697265 key: test_mcc value: [0.76807644 0.8585559 0.77238262 0.75632883 0.75156361 0.69717653 0.82131618 0.846254 0.87280881 0.75090861] mean value: 0.7895371517463353 key: train_mcc value: [0.79150846 0.78327281 0.79289836 0.79289836 0.79885223 0.79385491 0.80032507 0.79025852 0.78014362 0.79156576] mean value: 0.7915578108083023 key: test_accuracy value: [0.88073394 0.9266055 0.88073394 0.87155963 0.87155963 0.8440367 0.90825688 0.91743119 0.93577982 0.87155963] mean value: 0.8908256880733945 key: train_accuracy value: [0.89194699 0.88786952 0.89296636 0.89296636 0.89602446 0.89296636 0.89704383 0.89194699 0.88583078 0.89194699] mean value: 0.8921508664627931 key: test_fscore value: [0.88695652 0.92982456 0.88888889 0.88135593 0.87931034 0.85714286 0.9137931 0.92436975 0.9380531 0.88135593] mean value: 0.8981050987101319 key: train_fscore value: [0.89904762 0.8952381 0.89971347 0.89971347 0.90248566 0.89990467 0.90297791 0.89827255 0.89353612 0.89885496] mean value: 0.898974452130224 key: test_precision value: [0.83606557 0.88333333 0.82539683 0.8125 0.82258065 0.796875 0.86885246 0.859375 0.9137931 0.82539683] mean value: 0.8444168765523435 key: train_precision value: [0.84436494 0.84078712 0.8471223 0.8471223 0.85045045 0.84436494 0.85299456 0.84782609 0.83629893 0.84408602] mean value: 0.8455417645600413 key: test_recall value: [0.94444444 0.98148148 0.96296296 0.96296296 0.94444444 0.92727273 0.96363636 1. 0.96363636 0.94545455] mean value: 0.9596296296296296 key: train_recall value: [0.96130346 0.95723014 0.9592668 0.9592668 0.96130346 0.96326531 0.95918367 0.95510204 0.95918367 0.96122449] mean value: 0.959632985577123 key: test_roc_auc value: [0.88131313 0.92710438 0.88148148 0.87239057 0.87222222 0.84326599 0.90774411 0.91666667 0.93552189 0.87087542] mean value: 0.8908585858585859 key: train_roc_auc value: [0.89187622 0.88779874 0.89289871 0.89289871 0.89595785 0.89303795 0.89710711 0.89201131 0.88590548 0.89201754] mean value: 0.892150962217881 key: test_jcc value: [0.796875 0.86885246 0.8 0.78787879 0.78461538 0.75 0.84126984 0.859375 0.88333333 0.78787879] mean value: 0.8160078593992528 key: train_jcc value: [0.816609 0.81034483 0.81770833 0.81770833 0.82229965 0.81802426 0.82311734 0.81533101 0.80756014 0.81629116] mean value: 0.8164994052884171 MCC on Blind test: 0.46 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01432371 0.01744103 0.01744962 0.01802087 0.01732731 0.01733041 0.0174067 0.01744223 0.01742673 0.01739192] mean value: 0.017156052589416503 key: score_time value: [0.01219678 0.01250386 0.01259708 0.01271725 0.01278138 0.01281834 0.01276517 0.01271963 0.01281548 0.01277828] mean value: 0.01266932487487793 key: test_mcc value: [0.63446308 0.78024981 0.72570999 0.72570999 0.65151515 0.81882759 0.7280447 0.72598622 0.74351235 0.76486923] mean value: 0.7298888092654613 key: train_mcc value: [0.73813656 0.72410437 0.73869429 0.73014127 0.75785401 0.71345039 0.73419371 0.73010884 0.75399333 0.72585556] mean value: 0.7346532327609935 key: test_accuracy value: [0.81651376 0.88990826 0.86238532 0.86238532 0.82568807 0.90825688 0.86238532 0.86238532 0.87155963 0.88073394] mean value: 0.8642201834862385 key: train_accuracy value: [0.86850153 0.86136595 0.86850153 0.86442406 0.87869521 0.85626911 0.86646279 0.86442406 0.87665647 0.86238532] mean value: 0.8667686034658512 key: test_fscore value: [0.80769231 0.88679245 0.85714286 0.85714286 0.82568807 0.90566038 0.85714286 0.85981308 0.875 0.87619048] mean value: 0.8608265343006679 key: train_fscore value: [0.86492147 0.85714286 0.86406744 0.86044071 0.87668394 0.85235602 0.86225026 0.86014721 0.8738269 0.85834208] mean value: 0.8630178891837997 key: test_precision value: [0.84 0.90384615 0.88235294 0.88235294 0.81818182 0.94117647 0.9 0.88461538 0.85964912 0.92 ] mean value: 0.883217483239155 key: train_precision value: [0.89008621 0.88503254 0.89519651 0.88744589 0.89240506 0.87526882 0.88937093 0.88720174 0.89339019 0.88336933] mean value: 0.8878767209813069 key: test_recall value: [0.77777778 0.87037037 0.83333333 0.83333333 0.83333333 0.87272727 0.81818182 0.83636364 0.89090909 0.83636364] mean value: 0.8402693602693603 key: train_recall value: [0.84114053 0.83095723 0.83503055 0.83503055 0.86150713 0.83061224 0.83673469 0.83469388 0.85510204 0.83469388] mean value: 0.8395502722473919 key: test_roc_auc value: [0.81616162 0.88973064 0.86212121 0.86212121 0.82575758 0.90858586 0.86279461 0.86262626 0.87138047 0.88114478] mean value: 0.8642424242424243 key: train_roc_auc value: [0.86852945 0.86139698 0.86853568 0.86445405 0.87871275 0.85624299 0.86643252 0.86439378 0.87663452 0.86235712] mean value: 0.8667689845795752 key: test_jcc value: [0.67741935 0.79661017 0.75 0.75 0.703125 0.82758621 0.75 0.75409836 0.77777778 0.77966102] mean value: 0.7566277886609455 key: train_jcc value: [0.76199262 0.75 0.7606679 0.75506446 0.7804428 0.74270073 0.75785582 0.75461255 0.77592593 0.75183824] mean value: 0.7591101044424549 MCC on Blind test: 0.36 Accuracy on Blind test: 0.66 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01675367 0.01172948 0.01112866 0.01174927 0.01270866 0.01112056 0.01259708 0.01317763 0.01103449 0.01173782] mean value: 0.012373733520507812 key: score_time value: [0.03388858 0.01459813 0.0150125 0.01497555 0.0157063 0.01583314 0.01537919 0.0203886 0.01956248 0.01527023] mean value: 0.0180614709854126 key: test_mcc value: [0.91216737 0.946411 0.946411 0.98181818 1. 0.94635821 0.91202529 0.91202529 1. 0.96393713] mean value: 0.9521153465030314 key: train_mcc value: [0.96789422 0.96592058 0.96592058 0.96395068 0.96198449 0.96198744 0.96395333 0.97185442 0.96002526 0.96002526] mean value: 0.9643516270143173 key: test_accuracy value: [0.95412844 0.97247706 0.97247706 0.99082569 1. 0.97247706 0.95412844 0.95412844 1. 0.98165138] mean value: 0.9752293577981652 key: train_accuracy value: [0.98369011 0.98267074 0.98267074 0.98165138 0.98063201 0.98063201 0.98165138 0.98572885 0.97961264 0.97961264] mean value: 0.981855249745158 key: test_fscore value: [0.95575221 0.97297297 0.97297297 0.99082569 1. 0.97345133 0.95652174 0.95652174 1. 0.98214286] mean value: 0.9761161509246076 key: train_fscore value: [0.98396794 0.98298298 0.98298298 0.982 0.98101898 0.98098098 0.98196393 0.98591549 0.98 0.98 ] mean value: 0.9821813284651129 key: test_precision value: [0.91525424 0.94736842 0.94736842 0.98181818 1. 0.94827586 0.91666667 0.91666667 1. 0.96491228] mean value: 0.9538330737315633 key: train_precision value: [0.96844181 0.96653543 0.96653543 0.96463654 0.9627451 0.96267191 0.96456693 0.97222222 0.96078431 0.96078431] mean value: 0.9649924005520801 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.97272727 0.97272727 0.99090909 1. 0.97222222 0.9537037 0.9537037 1. 0.98148148] mean value: 0.9752020202020202 key: train_roc_auc value: [0.98367347 0.98265306 0.98265306 0.98163265 0.98061224 0.98065173 0.98167006 0.98574338 0.9796334 0.9796334 ] mean value: 0.9818556465356 key: test_jcc value: [0.91525424 0.94736842 0.94736842 0.98181818 1. 0.94827586 0.91666667 0.91666667 1. 0.96491228] mean value: 0.9538330737315633 key: train_jcc value: [0.96844181 0.96653543 0.96653543 0.96463654 0.9627451 0.96267191 0.96456693 0.97222222 0.96078431 0.96078431] mean value: 0.9649924005520801 MCC on Blind test: 0.19 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.0302546 0.02813029 0.02809978 0.0278976 0.02761269 0.02798128 0.03074837 0.02966928 0.03014135 0.02853847] mean value: 0.028907370567321778 key: score_time value: [0.01462579 0.01377559 0.01365614 0.01365614 0.01376009 0.0138247 0.01475239 0.01382637 0.01384187 0.01394892] mean value: 0.013966798782348633 key: test_mcc value: [0.96396098 0.98181818 1. 1. 0.96396098 1. 0.94635821 0.98181211 1. 0.94635821] mean value: 0.9784268692563587 key: train_mcc value: [0.98382069 0.98382069 0.98181631 0.98181631 0.98382069 0.98181698 0.98582943 0.98382123 0.97981668 0.98582943] mean value: 0.983220845209887 key: test_accuracy value: [0.98165138 0.99082569 1. 1. 0.98165138 1. 0.97247706 0.99082569 1. 0.97247706] mean value: 0.9889908256880734 key: train_accuracy value: [0.99184506 0.99184506 0.99082569 0.99082569 0.99184506 0.99082569 0.99286442 0.99184506 0.98980632 0.99286442] mean value: 0.991539245667686 key: test_fscore value: [0.98181818 0.99082569 1. 1. 0.98181818 1. 0.97345133 0.99099099 1. 0.97345133] mean value: 0.9892355697568005 key: train_fscore value: [0.99191919 0.99191919 0.99091826 0.99091826 0.99191919 0.9908999 0.9929078 0.99190283 0.98989899 0.9929078 ] mean value: 0.9916111430148137 key: test_precision value: [0.96428571 0.98181818 1. 1. 0.96428571 1. 0.94827586 0.98214286 1. 0.94827586] mean value: 0.9789084191670399 key: train_precision value: [0.98396794 0.98396794 0.982 0.982 0.98396794 0.98196393 0.98591549 0.98393574 0.98 0.98591549] mean value: 0.9833634464358323 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98181818 0.99090909 1. 1. 0.98181818 1. 0.97222222 0.99074074 1. 0.97222222] mean value: 0.988973063973064 key: train_roc_auc value: [0.99183673 0.99183673 0.99081633 0.99081633 0.99183673 0.99083503 0.99287169 0.99185336 0.9898167 0.99287169] mean value: 0.9915391329647949 key: test_jcc value: [0.96428571 0.98181818 1. 1. 0.96428571 1. 0.94827586 0.98214286 1. 0.94827586] mean value: 0.9789084191670399 key: train_jcc value: [0.98396794 0.98396794 0.982 0.982 0.98396794 0.98196393 0.98591549 0.98393574 0.98 0.98591549] mean value: 0.9833634464358323 MCC on Blind test: 0.28 Accuracy on Blind test: 0.59 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.71428227 2.08614016 2.62061143 2.41919136 2.2262578 2.1079421 2.22521067 2.41262627 2.21698761 2.23710108] mean value: 2.2266350746154786 key: score_time value: [0.01254702 0.01257324 0.01411819 0.01264691 0.01291251 0.01294541 0.01289654 0.01295042 0.01260424 0.01268959] mean value: 0.012888407707214356 key: test_mcc value: [0.96396098 0.96396098 0.98181818 1. 0.96396098 0.98181211 0.92905936 0.98181211 1. 0.92905936] mean value: 0.9695444073814772 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98165138 0.98165138 0.99082569 1. 0.98165138 0.99082569 0.96330275 0.99082569 1. 0.96330275] mean value: 0.9844036697247707 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.98181818 0.99082569 1. 0.98181818 0.99099099 0.96491228 0.99099099 1. 0.96491228] mean value: 0.9848086776913431 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96428571 0.96428571 0.98181818 1. 0.96428571 0.98214286 0.93220339 0.98214286 1. 0.93220339] mean value: 0.9703367818622056 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98181818 0.98181818 0.99090909 1. 0.98181818 0.99074074 0.96296296 0.99074074 1. 0.96296296] mean value: 0.9843771043771044 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.96428571 0.98181818 1. 0.96428571 0.98214286 0.93220339 0.98214286 1. 0.93220339] mean value: 0.9703367818622056 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.29 Accuracy on Blind test: 0.59 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04392576 0.0322082 0.03475714 0.03228712 0.0325017 0.033427 0.03042507 0.03155088 0.03268433 0.03452039] mean value: 0.03382875919342041 key: score_time value: [0.0112896 0.00917888 0.00912571 0.00906205 0.0092423 0.00923324 0.00922656 0.00922465 0.00923491 0.00925946] mean value: 0.009407734870910645 key: test_mcc value: [0.9291517 0.946411 0.98181818 1. 0.98181818 1. 0.94635821 0.98181211 0.98181211 0.94635821] mean value: 0.969553972019105 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96330275 0.97247706 0.99082569 1. 0.99082569 1. 0.97247706 0.99082569 0.99082569 0.97247706] mean value: 0.9844036697247707 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.97297297 0.99082569 1. 0.99082569 1. 0.97345133 0.99099099 0.99099099 0.97345133] mean value: 0.9847794700254715 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93103448 0.94736842 0.98181818 1. 0.98181818 1. 0.94827586 0.98214286 0.98214286 0.94827586] mean value: 0.9702876705871261 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96363636 0.97272727 0.99090909 1. 0.99090909 1. 0.97222222 0.99074074 0.99074074 0.97222222] mean value: 0.9844107744107744 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.94736842 0.98181818 1. 0.98181818 1. 0.94827586 0.98214286 0.98214286 0.94827586] mean value: 0.9702876705871261 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.18 Accuracy on Blind test: 0.55 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.1274991 0.12812495 0.12774634 0.12863183 0.12888646 0.13110614 0.12780571 0.12781787 0.12792253 0.12815571] mean value: 0.12836966514587403 key: score_time value: [0.01829576 0.01841164 0.01846838 0.0185349 0.01836324 0.01849031 0.01845002 0.0185163 0.01841092 0.0183692 ] mean value: 0.01843106746673584 key: test_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.54 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01226139 0.01228285 0.01226473 0.01214695 0.01218414 0.01226926 0.01206851 0.01219273 0.01233721 0.01204967] mean value: 0.012205743789672851 key: score_time value: [0.00921822 0.00932813 0.00931048 0.00925422 0.00949788 0.00925541 0.0092206 0.00980568 0.00936985 0.0093658 ] mean value: 0.009362626075744628 key: test_mcc value: [0.946411 0.98181818 0.98181818 0.98181818 0.946411 0.96393713 0.98181211 0.96393713 0.96393713 0.96393713] mean value: 0.9675837174332553 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97247706 0.99082569 0.99082569 0.99082569 0.97247706 0.98165138 0.99082569 0.98165138 0.98165138 0.98165138] mean value: 0.9834862385321101 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97297297 0.99082569 0.99082569 0.99082569 0.97297297 0.98214286 0.99099099 0.98214286 0.98214286 0.98214286] mean value: 0.9837985429728549 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94736842 0.98181818 0.98181818 0.98181818 0.94736842 0.96491228 0.98214286 0.96491228 0.96491228 0.96491228] mean value: 0.9681983367509683 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97272727 0.99090909 0.99090909 0.99090909 0.97272727 0.98148148 0.99074074 0.98148148 0.98148148 0.98148148] mean value: 0.9834848484848485 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94736842 0.98181818 0.98181818 0.98181818 0.94736842 0.96491228 0.98214286 0.96491228 0.96491228 0.96491228] mean value: 0.9681983367509683 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.23 Accuracy on Blind test: 0.57 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.84613895 1.84882665 1.8708818 1.86475921 1.84473586 1.86491394 1.85439205 1.88788891 1.88814402 1.87820005] mean value: 1.8648881435394287 key: score_time value: [0.10223675 0.09952426 0.09806609 0.09618688 0.09979129 0.1016283 0.09665799 0.10203648 0.09662509 0.10175681] mean value: 0.09945099353790283 key: test_mcc value: [0.96396098 0.98181818 1. 1. 1. 1. 0.98181211 0.98181211 1. 0.98181211] mean value: 0.9891215506967859 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98165138 0.99082569 1. 1. 1. 1. 0.99082569 0.99082569 1. 0.99082569] mean value: 0.9944954128440368 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.99082569 1. 1. 1. 1. 0.99099099 0.99099099 1. 0.99099099] mean value: 0.9945616842864549 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96428571 0.98181818 1. 1. 1. 1. 0.98214286 0.98214286 1. 0.98214286] mean value: 0.9892532467532468 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98181818 0.99090909 1. 1. 1. 1. 0.99074074 0.99074074 1. 0.99074074] mean value: 0.9944949494949495 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.98181818 1. 1. 1. 1. 0.98214286 0.98214286 1. 0.98214286] mean value: 0.9892532467532468 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.21 Accuracy on Blind test: 0.55 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.09517479 1.02668071 1.03418374 1.04638362 1.05752635 1.01023602 1.05089188 1.00196314 1.07685375 1.11304617] mean value: 1.0512940168380738 key: score_time value: [0.28117323 0.28493404 0.23831916 0.273947 0.27481461 0.26104164 0.21751809 0.22988033 0.25270414 0.24373722] mean value: 0.2558069467544556 key: test_mcc value: [0.946411 0.96396098 1. 1. 0.96396098 1. 0.94635821 0.96393713 0.98181211 0.96393713] mean value: 0.9730377554095189 key: train_mcc value: [0.99187796 0.98784133 0.98784133 0.98985763 0.98784133 0.98582943 0.98582943 0.98784163 0.98985784 0.98985784] mean value: 0.9884475773437704 key: test_accuracy value: [0.97247706 0.98165138 1. 1. 0.98165138 1. 0.97247706 0.98165138 0.99082569 0.98165138] mean value: 0.9862385321100917 key: train_accuracy value: [0.99592253 0.99388379 0.99388379 0.99490316 0.99388379 0.99286442 0.99286442 0.99388379 0.99490316 0.99490316] mean value: 0.9941896024464831 key: test_fscore value: [0.97297297 0.98181818 1. 1. 0.98181818 1. 0.97345133 0.98214286 0.99099099 0.98214286] mean value: 0.986533736931967 key: train_fscore value: [0.9959432 0.99392713 0.99392713 0.99493414 0.99392713 0.9929078 0.9929078 0.99391481 0.99492386 0.99492386] mean value: 0.9942236851131838 key: test_precision value: [0.94736842 0.96428571 1. 1. 0.96428571 1. 0.94827586 0.96491228 0.98214286 0.96491228] mean value: 0.9736183130239392 key: train_precision value: [0.99191919 0.98792757 0.98792757 0.98991935 0.98792757 0.98591549 0.98591549 0.98790323 0.98989899 0.98989899] mean value: 0.9885153434454889 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97272727 0.98181818 1. 1. 0.98181818 1. 0.97222222 0.98148148 0.99074074 0.98148148] mean value: 0.9862289562289562 key: train_roc_auc value: [0.99591837 0.99387755 0.99387755 0.99489796 0.99387755 0.99287169 0.99287169 0.99389002 0.99490835 0.99490835] mean value: 0.994189908142483 key: test_jcc value: [0.94736842 0.96428571 1. 1. 0.96428571 1. 0.94827586 0.96491228 0.98214286 0.96491228] mean value: 0.9736183130239392 key: train_jcc value: [0.99191919 0.98792757 0.98792757 0.98991935 0.98792757 0.98591549 0.98591549 0.98790323 0.98989899 0.98989899] mean value: 0.9885153434454889 MCC on Blind test: 0.29 Accuracy on Blind test: 0.59 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02011395 0.01671338 0.01653075 0.02227449 0.01953197 0.01918554 0.01853752 0.01888394 0.01923013 0.01840615] mean value: 0.01894078254699707 key: score_time value: [0.01299596 0.01254225 0.01255369 0.01404405 0.01341295 0.01391649 0.01417804 0.01381135 0.01417184 0.0138309 ] mean value: 0.013545751571655273 key: test_mcc value: [0.63446308 0.78024981 0.72570999 0.72570999 0.65151515 0.81882759 0.7280447 0.72598622 0.74351235 0.76486923] mean value: 0.7298888092654613 key: train_mcc value: [0.73813656 0.72410437 0.73869429 0.73014127 0.75785401 0.71345039 0.73419371 0.73010884 0.75399333 0.72585556] mean value: 0.7346532327609935 key: test_accuracy value: [0.81651376 0.88990826 0.86238532 0.86238532 0.82568807 0.90825688 0.86238532 0.86238532 0.87155963 0.88073394] mean value: 0.8642201834862385 key: train_accuracy value: [0.86850153 0.86136595 0.86850153 0.86442406 0.87869521 0.85626911 0.86646279 0.86442406 0.87665647 0.86238532] mean value: 0.8667686034658512 key: test_fscore value: [0.80769231 0.88679245 0.85714286 0.85714286 0.82568807 0.90566038 0.85714286 0.85981308 0.875 0.87619048] mean value: 0.8608265343006679 key: train_fscore value: [0.86492147 0.85714286 0.86406744 0.86044071 0.87668394 0.85235602 0.86225026 0.86014721 0.8738269 0.85834208] mean value: 0.8630178891837997 key: test_precision value: [0.84 0.90384615 0.88235294 0.88235294 0.81818182 0.94117647 0.9 0.88461538 0.85964912 0.92 ] mean value: 0.883217483239155 key: train_precision value: [0.89008621 0.88503254 0.89519651 0.88744589 0.89240506 0.87526882 0.88937093 0.88720174 0.89339019 0.88336933] mean value: 0.8878767209813069 key: test_recall value: [0.77777778 0.87037037 0.83333333 0.83333333 0.83333333 0.87272727 0.81818182 0.83636364 0.89090909 0.83636364] mean value: 0.8402693602693603 key: train_recall value: [0.84114053 0.83095723 0.83503055 0.83503055 0.86150713 0.83061224 0.83673469 0.83469388 0.85510204 0.83469388] mean value: 0.8395502722473919 key: test_roc_auc value: [0.81616162 0.88973064 0.86212121 0.86212121 0.82575758 0.90858586 0.86279461 0.86262626 0.87138047 0.88114478] mean value: 0.8642424242424243 key: train_roc_auc value: [0.86852945 0.86139698 0.86853568 0.86445405 0.87871275 0.85624299 0.86643252 0.86439378 0.87663452 0.86235712] mean value: 0.8667689845795752 key: test_jcc value: [0.67741935 0.79661017 0.75 0.75 0.703125 0.82758621 0.75 0.75409836 0.77777778 0.77966102] mean value: 0.7566277886609455 key: train_jcc value: [0.76199262 0.75 0.7606679 0.75506446 0.7804428 0.74270073 0.75785582 0.75461255 0.77592593 0.75183824] mean value: 0.7591101044424549 MCC on Blind test: 0.36 Accuracy on Blind test: 0.66 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.11796188 0.10961556 0.10818744 0.10456371 0.1058588 0.1078229 0.25975823 0.10478783 0.10844398 0.13115287] mean value: 0.1258153200149536 key: score_time value: [0.01118398 0.011338 0.01129174 0.01120877 0.01139569 0.01124907 0.01117444 0.0112884 0.01124215 0.01141405] mean value: 0.011278629302978516 key: test_mcc value: [0.91216737 1. 1. 1. 0.96396098 0.98181211 0.96393713 0.98181211 0.98181211 0.96393713] mean value: 0.9749438950975768 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95412844 1. 1. 1. 0.98165138 0.99082569 0.98165138 0.99082569 0.99082569 0.98165138] mean value: 0.9871559633027523 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95575221 1. 1. 1. 0.98181818 0.99099099 0.98214286 0.99099099 0.99099099 0.98214286] mean value: 0.987482908146625 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91525424 1. 1. 1. 0.96428571 0.98214286 0.96491228 0.98214286 0.98214286 0.96491228] mean value: 0.975579308440593 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 1. 1. 1. 0.98181818 0.99074074 0.98148148 0.99074074 0.99074074 0.98148148] mean value: 0.9871548821548821 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.91525424 1. 1. 1. 0.96428571 0.98214286 0.96491228 0.98214286 0.98214286 0.96491228] mean value: 0.975579308440593 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.3 Accuracy on Blind test: 0.6 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.07150984 0.06511164 0.06523538 0.06628895 0.09167719 0.06410146 0.07649589 0.09414434 0.06980395 0.06749415] mean value: 0.07318627834320068 key: score_time value: [0.02362418 0.01245999 0.01245952 0.01243114 0.01258183 0.01239157 0.01956177 0.02616715 0.01251602 0.02478385] mean value: 0.016897702217102052 key: test_mcc value: [0.89544301 0.91216737 0.89544301 1. 0.946411 0.98181211 0.87869377 0.89524142 0.98181211 0.89524142] mean value: 0.9282265219647683 key: train_mcc value: [0.97582662 0.96987162 0.9738378 0.96789422 0.96987162 0.96789632 0.96987347 0.97185442 0.96592295 0.97383919] mean value: 0.9706688251163492 key: test_accuracy value: [0.94495413 0.95412844 0.94495413 1. 0.97247706 0.99082569 0.93577982 0.94495413 0.99082569 0.94495413] mean value: 0.9623853211009175 key: train_accuracy value: [0.98776758 0.98470948 0.98674822 0.98369011 0.98470948 0.98369011 0.98470948 0.98572885 0.98267074 0.98674822] mean value: 0.9851172273190621 key: test_fscore value: [0.94736842 0.95575221 0.94736842 1. 0.97297297 0.99099099 0.94017094 0.94827586 0.99099099 0.94827586] mean value: 0.964216667375847 key: train_fscore value: [0.98792757 0.98495486 0.98693467 0.98396794 0.98495486 0.98393574 0.98492462 0.98591549 0.98294885 0.98690836] mean value: 0.9853372967912892 key: test_precision value: [0.9 0.91525424 0.9 1. 0.94736842 0.98214286 0.88709677 0.90163934 0.98214286 0.90163934] mean value: 0.931728383534462 key: train_precision value: [0.97614314 0.97035573 0.97420635 0.96844181 0.97035573 0.96837945 0.97029703 0.97222222 0.96646943 0.97415507] mean value: 0.9711025963561587 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94545455 0.95454545 0.94545455 1. 0.97272727 0.99074074 0.93518519 0.94444444 0.99074074 0.94444444] mean value: 0.9623737373737373 key: train_roc_auc value: [0.9877551 0.98469388 0.98673469 0.98367347 0.98469388 0.98370672 0.98472505 0.98574338 0.98268839 0.98676171] mean value: 0.9851176274990648 key: test_jcc value: [0.9 0.91525424 0.9 1. 0.94736842 0.98214286 0.88709677 0.90163934 0.98214286 0.90163934] mean value: 0.931728383534462 key: train_jcc value: [0.97614314 0.97035573 0.97420635 0.96844181 0.97035573 0.96837945 0.97029703 0.97222222 0.96646943 0.97415507] mean value: 0.9711025963561587 MCC on Blind test: 0.28 Accuracy on Blind test: 0.61 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01576996 0.0410161 0.04309177 0.01604724 0.01587033 0.01609635 0.0160675 0.01610398 0.01614022 0.01605773] mean value: 0.021226119995117188 key: score_time value: [0.01217413 0.01242757 0.01232624 0.01218367 0.01216888 0.01491857 0.01216054 0.01213479 0.01214075 0.01212192] mean value: 0.012475705146789551 key: test_mcc value: [0.68811051 0.67128761 0.72482321 0.65151515 0.56345904 0.67025938 0.74368478 0.71100747 0.67172877 0.72598622] mean value: 0.6821862154657743 key: train_mcc value: [0.68441792 0.69257749 0.67825101 0.68233052 0.69490253 0.68014841 0.67406705 0.68422743 0.68630482 0.68439564] mean value: 0.6841622811613333 key: test_accuracy value: [0.8440367 0.83486239 0.86238532 0.82568807 0.77981651 0.83486239 0.87155963 0.85321101 0.83486239 0.86238532] mean value: 0.8403669724770643 key: train_accuracy value: [0.84199796 0.84607543 0.83893986 0.84097859 0.8470948 0.83995923 0.83690112 0.84199796 0.84301733 0.84199796] mean value: 0.8418960244648318 key: test_fscore value: [0.8411215 0.82692308 0.85981308 0.82568807 0.78947368 0.83333333 0.87037037 0.84615385 0.83018868 0.85981308] mean value: 0.8382878727182334 key: train_fscore value: [0.83937824 0.84352332 0.83643892 0.83850932 0.84375 0.83764219 0.83436853 0.83971044 0.84057971 0.83904465] mean value: 0.8392945323885889 key: test_precision value: [0.8490566 0.86 0.86792453 0.81818182 0.75 0.8490566 0.88679245 0.89795918 0.8627451 0.88461538] mean value: 0.8526331673189134 key: train_precision value: [0.85443038 0.85864979 0.85052632 0.85263158 0.86353945 0.8490566 0.84663866 0.85115304 0.85294118 0.85412262] mean value: 0.8533689606245336 key: test_recall value: [0.83333333 0.7962963 0.85185185 0.83333333 0.83333333 0.81818182 0.85454545 0.8 0.8 0.83636364] mean value: 0.8257239057239057 key: train_recall value: [0.82484725 0.82892057 0.82281059 0.82484725 0.82484725 0.82653061 0.82244898 0.82857143 0.82857143 0.8244898 ] mean value: 0.8256885157321584 key: test_roc_auc value: [0.84393939 0.83451178 0.86228956 0.82575758 0.78030303 0.83501684 0.87171717 0.8537037 0.83518519 0.86262626] mean value: 0.8405050505050505 key: train_roc_auc value: [0.84201546 0.84609294 0.83895632 0.84099505 0.8471175 0.83994555 0.8368864 0.84198429 0.84300262 0.84198013] mean value: 0.8418976266677751 key: test_jcc value: [0.72580645 0.70491803 0.75409836 0.703125 0.65217391 0.71428571 0.7704918 0.73333333 0.70967742 0.75409836] mean value: 0.7222008389007317 key: train_jcc value: [0.72321429 0.72939068 0.71886121 0.72192513 0.72972973 0.72064057 0.71580817 0.72370766 0.725 0.72271914] mean value: 0.7230996586219895 MCC on Blind test: 0.45 Accuracy on Blind test: 0.71 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0369277 0.04088545 0.03215814 0.03854275 0.03352094 0.03347206 0.0334053 0.02965021 0.04916883 0.03165388] mean value: 0.03593852519989014 key: score_time value: [0.01192832 0.01221871 0.01212454 0.01213908 0.01212454 0.0121572 0.0121696 0.01220918 0.01216412 0.01215053] mean value: 0.012138581275939942 key: test_mcc value: [0.96396098 0.96396098 0.98181818 0.98181818 0.946411 0.98181211 0.92905936 0.98181211 1. 0.92905936] mean value: 0.9659712270812341 key: train_mcc value: [0.99796333 0.99796333 0.98784133 0.98784133 0.97582662 0.98985784 0.98985784 0.9918781 0.99593082 0.9918781 ] mean value: 0.9906838647005597 key: test_accuracy value: [0.98165138 0.98165138 0.99082569 0.99082569 0.97247706 0.99082569 0.96330275 0.99082569 1. 0.96330275] mean value: 0.9825688073394496 key: train_accuracy value: [0.99898063 0.99898063 0.99388379 0.99388379 0.98776758 0.99490316 0.99490316 0.99592253 0.99796126 0.99592253] mean value: 0.9953109072375127 key: test_fscore value: [0.98181818 0.98181818 0.99082569 0.99082569 0.97297297 0.99099099 0.96491228 0.99099099 1. 0.96491228] mean value: 0.9830067256141616 key: train_fscore value: [0.99898271 0.99898271 0.99392713 0.99392713 0.98792757 0.99492386 0.99492386 0.99593496 0.99796334 0.99593496] mean value: 0.9953428202965996 key: test_precision value: [0.96428571 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286 0.93220339 0.98214286 1. 0.93220339] mean value: 0.9668268707207155 key: train_precision value: [0.99796748 0.99796748 0.98792757 0.98792757 0.97614314 0.98989899 0.98989899 0.99190283 0.99593496 0.99190283] mean value: 0.9907471838451151 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98181818 0.98181818 0.99090909 0.99090909 0.97272727 0.99074074 0.96296296 0.99074074 1. 0.96296296] mean value: 0.9825589225589225 key: train_roc_auc value: [0.99897959 0.99897959 0.99387755 0.99387755 0.9877551 0.99490835 0.99490835 0.99592668 0.99796334 0.99592668] mean value: 0.9953102788977097 key: test_jcc value: [0.96428571 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286 0.93220339 0.98214286 1. 0.93220339] mean value: 0.9668268707207155 key: train_jcc value: [0.99796748 0.99796748 0.98792757 0.98792757 0.97614314 0.98989899 0.98989899 0.99190283 0.99593496 0.99190283] mean value: 0.9907471838451151 MCC on Blind test: 0.26 Accuracy on Blind test: 0.58 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02573824 0.02769852 0.02286506 0.0265317 0.02289176 0.02876425 0.02368808 0.02303743 0.02485371 0.02317619] mean value: 0.024924492835998534 key: score_time value: [0.0121398 0.01220655 0.01216602 0.01215792 0.01220179 0.01250315 0.01214266 0.01208639 0.01215744 0.01210952] mean value: 0.01218712329864502 key: test_mcc value: [0.946411 0.96396098 0.98181818 0.98181818 0.946411 0.98181211 0.91202529 0.98181211 1. 0.92905936] mean value: 0.9625128216673053 key: train_mcc value: [0.99593079 0.99593079 0.96789422 0.98382069 0.98784133 0.99796334 0.97981668 0.98582943 0.99593082 0.96592295] mean value: 0.9856881042566935 key: test_accuracy value: [0.97247706 0.98165138 0.99082569 0.99082569 0.97247706 0.99082569 0.95412844 0.99082569 1. 0.96330275] mean value: 0.9807339449541285 key: train_accuracy value: [0.99796126 0.99796126 0.98369011 0.99184506 0.99388379 0.99898063 0.98980632 0.99286442 0.99796126 0.98267074] mean value: 0.9927624872579001 key: test_fscore value: [0.97297297 0.98181818 0.99082569 0.99082569 0.97297297 0.99099099 0.95652174 0.99099099 1. 0.96491228] mean value: 0.9812831505725088 key: train_fscore value: [0.99796748 0.99796748 0.98396794 0.99191919 0.99392713 0.99898063 0.98989899 0.9929078 0.99796334 0.98294885] mean value: 0.9928448822634005 key: test_precision value: [0.94736842 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286 0.91666667 0.98214286 1. 0.93220339] mean value: 0.963581469081023 key: train_precision value: [0.9959432 0.9959432 0.96844181 0.98396794 0.98792757 0.99796334 0.98 0.98591549 0.99593496 0.96646943] mean value: 0.9858506946033496 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97272727 0.98181818 0.99090909 0.99090909 0.97272727 0.99074074 0.9537037 0.99074074 1. 0.96296296] mean value: 0.9807239057239057 key: train_roc_auc value: [0.99795918 0.99795918 0.98367347 0.99183673 0.99387755 0.99898167 0.9898167 0.99287169 0.99796334 0.98268839] mean value: 0.9927627914709672 key: test_jcc value: [0.94736842 0.96428571 0.98181818 0.98181818 0.94736842 0.98214286 0.91666667 0.98214286 1. 0.93220339] mean value: 0.963581469081023 key: train_jcc value: [0.9959432 0.9959432 0.96844181 0.98396794 0.98792757 0.99796334 0.98 0.98591549 0.99593496 0.96646943] mean value: 0.9858506946033496 MCC on Blind test: 0.29 Accuracy on Blind test: 0.6 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.28126764 0.26553988 0.26455736 0.26455355 0.27714682 0.26926732 0.26987267 0.26936221 0.27386642 0.26629829] mean value: 0.2701732158660889 key: score_time value: [0.01591969 0.01593041 0.01601982 0.01616859 0.01612663 0.01737332 0.01592994 0.01732421 0.01614046 0.01601481] mean value: 0.01629478931427002 key: test_mcc value: [0.9291517 0.96396098 1. 1. 0.96396098 1. 0.96393713 0.98181211 1. 0.98181211] mean value: 0.9784635026184957 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96330275 0.98165138 1. 1. 0.98165138 1. 0.98165138 0.99082569 1. 0.99082569] mean value: 0.9889908256880734 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.98181818 1. 1. 0.98181818 1. 0.98214286 0.99099099 1. 0.99099099] mean value: 0.9892046917046917 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93103448 0.96428571 1. 1. 0.96428571 1. 0.96491228 0.98214286 1. 0.98214286] mean value: 0.9788803906317518 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96363636 0.98181818 1. 1. 0.98181818 1. 0.98148148 0.99074074 1. 0.99074074] mean value: 0.989023569023569 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.96428571 1. 1. 0.96428571 1. 0.96491228 0.98214286 1. 0.98214286] mean value: 0.9788803906317518 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.27 Accuracy on Blind test: 0.59 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.17862988 0.20598102 0.20765114 0.20498371 0.1792326 0.2004025 0.20543003 0.20129752 0.20281935 0.20100617] mean value: 0.1987433910369873 key: score_time value: [0.02799296 0.04068851 0.04115677 0.04074216 0.04036856 0.03043032 0.03932166 0.04003978 0.03971243 0.0401566 ] mean value: 0.038060975074768064 key: test_mcc value: [0.91216737 0.96396098 1. 1. 0.98181818 1. 0.94635821 0.98181211 0.96393713 0.94635821] mean value: 0.9696412205023645 key: train_mcc value: [1. 0.99796333 0.99593079 0.99593079 0.99390234 0.99593082 0.99390241 0.99593082 0.99796334 0.9918781 ] mean value: 0.9959332732076022 key: test_accuracy value: [0.95412844 0.98165138 1. 1. 0.99082569 1. 0.97247706 0.99082569 0.98165138 0.97247706] mean value: 0.9844036697247707 key: train_accuracy value: [1. 0.99898063 0.99796126 0.99796126 0.9969419 0.99796126 0.9969419 0.99796126 0.99898063 0.99592253] mean value: 0.9979612640163099 key: test_fscore value: [0.95575221 0.98181818 1. 1. 0.99082569 1. 0.97345133 0.99099099 0.98214286 0.97345133] mean value: 0.9848432585282061 key: train_fscore value: [1. 0.99898271 0.99796748 0.99796748 0.99695431 0.99796334 0.99694812 0.99796334 0.99898063 0.99593496] mean value: 0.9979662369680692 key: test_precision value: [0.91525424 0.96428571 1. 1. 0.98181818 1. 0.94827586 0.98214286 0.96491228 0.94827586] mean value: 0.9704964995374574 key: train_precision value: [1. 0.99796748 0.9959432 0.9959432 0.99392713 0.99593496 0.99391481 0.99593496 0.99796334 0.99190283] mean value: 0.9959431915048893 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.98181818 1. 1. 0.99090909 1. 0.97222222 0.99074074 0.98148148 0.97222222] mean value: 0.9843939393939394 key: train_roc_auc value: [1. 0.99897959 0.99795918 0.99795918 0.99693878 0.99796334 0.99694501 0.99796334 0.99898167 0.99592668] mean value: 0.9979616775427075 key: test_jcc value: [0.91525424 0.96428571 1. 1. 0.98181818 1. 0.94827586 0.98214286 0.96491228 0.94827586] mean value: 0.9704964995374574 key: train_jcc value: [1. 0.99796748 0.9959432 0.9959432 0.99392713 0.99593496 0.99391481 0.99593496 0.99796334 0.99190283] mean value: 0.9959431915048893 MCC on Blind test: 0.25 Accuracy on Blind test: 0.57 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.47767258 0.4457314 0.63524914 0.67361498 0.61306643 0.56355143 0.53762197 0.56128907 0.43451262 0.60698318] mean value: 0.5549292802810669 key: score_time value: [0.02317858 0.026896 0.04104447 0.04110169 0.04912281 0.04197502 0.04222465 0.0552702 0.03481197 0.041291 ] mean value: 0.0396916389465332 key: test_mcc value: [0.946411 0.96396098 0.98181818 1. 0.98181818 1. 0.92905936 0.98181211 1. 0.98181211] mean value: 0.9766691930577851 key: train_mcc value: [0.99796333 0.99593079 0.99390234 0.99390234 0.99390234 0.99390241 0.99390241 0.99390241 0.99390241 0.99390241] mean value: 0.9945113200343811 key: test_accuracy value: [0.97247706 0.98165138 0.99082569 1. 0.99082569 1. 0.96330275 0.99082569 1. 0.99082569] mean value: 0.9880733944954129 key: train_accuracy value: [0.99898063 0.99796126 0.9969419 0.9969419 0.9969419 0.9969419 0.9969419 0.9969419 0.9969419 0.9969419 ] mean value: 0.9972477064220183 key: test_fscore value: [0.97297297 0.98181818 0.99082569 1. 0.99082569 1. 0.96491228 0.99099099 1. 0.99099099] mean value: 0.988333679362168 key: train_fscore value: [0.99898271 0.99796748 0.99695431 0.99695431 0.99695431 0.99694812 0.99694812 0.99694812 0.99694812 0.99694812] mean value: 0.9972553719869787 key: test_precision value: [0.94736842 0.96428571 0.98181818 1. 0.98181818 1. 0.93220339 0.98214286 1. 0.98214286] mean value: 0.9771779603090932 key: train_precision value: [0.99796748 0.9959432 0.99392713 0.99392713 0.99392713 0.99391481 0.99391481 0.99391481 0.99391481 0.99391481] mean value: 0.9945266097572326 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97272727 0.98181818 0.99090909 1. 0.99090909 1. 0.96296296 0.99074074 1. 0.99074074] mean value: 0.9880808080808081 key: train_roc_auc value: [0.99897959 0.99795918 0.99693878 0.99693878 0.99693878 0.99694501 0.99694501 0.99694501 0.99694501 0.99694501] mean value: 0.9972480152957314 key: test_jcc value: [0.94736842 0.96428571 0.98181818 1. 0.98181818 1. 0.93220339 0.98214286 1. 0.98214286] mean value: 0.9771779603090932 key: train_jcc value: [0.99796748 0.9959432 0.99392713 0.99392713 0.99392713 0.99391481 0.99391481 0.99391481 0.99391481 0.99391481] mean value: 0.9945266097572326 MCC on Blind test: 0.16 Accuracy on Blind test: 0.54 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [1.11533022 1.10044742 1.10230374 1.10466123 1.10110593 1.12116337 1.1251564 1.11318278 1.11536574 1.13156414] mean value: 1.11302809715271 key: score_time value: [0.00962567 0.00945687 0.00967073 0.00957179 0.00962257 0.00992775 0.010005 0.0098331 0.01052403 0.01036096] mean value: 0.009859848022460937 key: test_mcc value: [0.9291517 0.98181818 1. 0.98181818 0.946411 0.98181211 0.94635821 0.96393713 0.98181211 0.94635821] mean value: 0.9659476849273602 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96330275 0.99082569 1. 0.99082569 0.97247706 0.99082569 0.97247706 0.98165138 0.99082569 0.97247706] mean value: 0.9825688073394496 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.99082569 1. 0.99082569 0.97297297 0.99099099 0.97345133 0.98214286 0.99099099 0.97345133] mean value: 0.9829937557397572 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93103448 0.98181818 1. 0.98181818 0.94736842 0.98214286 0.94827586 0.96491228 0.98214286 0.94827586] mean value: 0.9667788986573016 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96363636 0.99090909 1. 0.99090909 0.97272727 0.99074074 0.97222222 0.98148148 0.99074074 0.97222222] mean value: 0.9825589225589225 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.98181818 1. 0.98181818 0.94736842 0.98214286 0.94827586 0.96491228 0.98214286 0.94827586] mean value: 0.9667788986573016 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.28 Accuracy on Blind test: 0.59 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.04244852 0.04812479 0.04577637 0.0463593 0.04909849 0.04620719 0.04703236 0.05215859 0.04779005 0.04598045] mean value: 0.04709761142730713 key: score_time value: [0.01303792 0.02137995 0.01299429 0.01325035 0.02061152 0.01301885 0.01324797 0.01297951 0.02275825 0.01454926] mean value: 0.015782785415649415 key: test_mcc value: [1. 1. 1. 1. 1. 1. 0.34851405 1. 1. 1. ] mean value: 0.9348514050091998 key: train_mcc value: [1. 1. 1. 1. 1. 0.96198449 0.35025059 1. 1. 1. ] mean value: 0.9312235085917535 key: test_accuracy value: [1. 1. 1. 1. 1. 1. 0.60550459 1. 1. 1. ] mean value: 0.9605504587155963 key: train_accuracy value: [1. 1. 1. 1. 1. 0.98063201 0.60958206 1. 1. 1. ] mean value: 0.9590214067278288 key: test_fscore value: [1. 1. 1. 1. 1. 1. 0.35820896 1. 1. 1. ] mean value: 0.935820895522388 key: train_fscore value: [1. 1. 1. 1. 1. 0.98022893 0.35845896 1. 1. 1. ] mean value: 0.9338687889673829 key: test_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 0.21818182 1. 1. 1. ] mean value: 0.9218181818181819 key: train_recall value: [1. 1. 1. 1. 1. 0.96122449 0.21836735 1. 1. 1. ] mean value: 0.9179591836734694 key: test_roc_auc value: [1. 1. 1. 1. 1. 1. 0.60909091 1. 1. 1. ] mean value: 0.9609090909090909 key: train_roc_auc value: [1. 1. 1. 1. 1. 0.98061224 0.60918367 1. 1. 1. ] mean value: 0.9589795918367346 key: test_jcc value: [1. 1. 1. 1. 1. 1. 0.21818182 1. 1. 1. ] mean value: 0.9218181818181819 key: train_jcc value: [1. 1. 1. 1. 1. 0.96122449 0.21836735 1. 1. 1. ] mean value: 0.9179591836734694 MCC on Blind test: 0.0 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02674341 0.03784299 0.04292107 0.01843095 0.02614713 0.0395968 0.01892185 0.01898193 0.01882291 0.02836633] mean value: 0.027677536010742188 key: score_time value: [0.01909041 0.01905203 0.01903725 0.01596856 0.01496267 0.0134511 0.01707458 0.01234651 0.0223639 0.02151322] mean value: 0.017486023902893066 key: test_mcc value: [0.946411 0.9291517 1. 0.98181818 0.946411 0.98181211 0.91202529 0.94635821 1. 0.91202529] mean value: 0.9556012783200151 key: train_mcc value: [0.96789422 0.96592058 0.96789422 0.96592058 0.96789422 0.96592295 0.96987347 0.96987347 0.96198744 0.97185442] mean value: 0.9675035590265852 key: test_accuracy value: [0.97247706 0.96330275 1. 0.99082569 0.97247706 0.99082569 0.95412844 0.97247706 1. 0.95412844] mean value: 0.9770642201834863 key: train_accuracy value: [0.98369011 0.98267074 0.98369011 0.98267074 0.98369011 0.98267074 0.98470948 0.98470948 0.98063201 0.98572885] mean value: 0.9834862385321101 key: test_fscore value: [0.97297297 0.96428571 1. 0.99082569 0.97297297 0.99099099 0.95652174 0.97345133 1. 0.95652174] mean value: 0.9778543144990544 key: train_fscore value: [0.98396794 0.98298298 0.98396794 0.98298298 0.98396794 0.98294885 0.98492462 0.98492462 0.98098098 0.98591549] mean value: 0.9837564340290699 key: test_precision value: [0.94736842 0.93103448 1. 0.98181818 0.94736842 0.98214286 0.91666667 0.94827586 1. 0.91666667] mean value: 0.9571341559227221 key: train_precision value: [0.96844181 0.96653543 0.96844181 0.96653543 0.96844181 0.96646943 0.97029703 0.97029703 0.96267191 0.97222222] mean value: 0.9680353925262213 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97272727 0.96363636 1. 0.99090909 0.97272727 0.99074074 0.9537037 0.97222222 1. 0.9537037 ] mean value: 0.977037037037037 key: train_roc_auc value: [0.98367347 0.98265306 0.98367347 0.98265306 0.98367347 0.98268839 0.98472505 0.98472505 0.98065173 0.98574338] mean value: 0.9834860135500229 key: test_jcc value: [0.94736842 0.93103448 1. 0.98181818 0.94736842 0.98214286 0.91666667 0.94827586 1. 0.91666667] mean value: 0.9571341559227221 key: train_jcc value: [0.96844181 0.96653543 0.96844181 0.96653543 0.96844181 0.96646943 0.97029703 0.97029703 0.96267191 0.97222222] mean value: 0.9680353925262213 MCC on Blind test: 0.3 Accuracy on Blind test: 0.61 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:155: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py:158: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.26458216 0.32423019 0.28895402 0.36766052 0.35149288 0.44386435 0.33424354 0.35568094 0.34150338 0.36406851] mean value: 0.34362804889678955 key: score_time value: [0.01919508 0.01234579 0.0192287 0.02447915 0.01916623 0.01922035 0.0209403 0.01921153 0.02182055 0.01927543] mean value: 0.019488310813903807 key: test_mcc value: [0.946411 0.9291517 1. 0.98181818 0.96396098 0.94635821 0.91202529 0.94635821 0.98181211 0.91202529] mean value: 0.9519920982298953 key: train_mcc value: [0.960022 0.96592058 0.956108 0.95806317 0.960022 0.96198744 0.96198744 0.96987347 0.95611192 0.96592295] mean value: 0.961601897888028 key: test_accuracy value: [0.97247706 0.96330275 1. 0.99082569 0.98165138 0.97247706 0.95412844 0.97247706 0.99082569 0.95412844] mean value: 0.9752293577981652 key: train_accuracy value: [0.97961264 0.98267074 0.9775739 0.97859327 0.97961264 0.98063201 0.98063201 0.98470948 0.9775739 0.98267074] mean value: 0.980428134556575 key: test_fscore value: [0.97297297 0.96428571 1. 0.99082569 0.98181818 0.97345133 0.95652174 0.97345133 0.99099099 0.95652174] mean value: 0.9760839681269381 key: train_fscore value: [0.98003992 0.98298298 0.97808765 0.97906281 0.98003992 0.98098098 0.98098098 0.98492462 0.97804391 0.98294885] mean value: 0.9808092628062846 key: test_precision value: [0.94736842 0.93103448 1. 0.98181818 0.96428571 0.94827586 0.91666667 0.94827586 0.98214286 0.91666667] mean value: 0.953653471452927 key: train_precision value: [0.96086106 0.96653543 0.95711501 0.95898438 0.96086106 0.96267191 0.96267191 0.97029703 0.95703125 0.96646943] mean value: 0.9623498450426142 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97272727 0.96363636 1. 0.99090909 0.98181818 0.97222222 0.9537037 0.97222222 0.99074074 0.9537037 ] mean value: 0.9751683501683501 key: train_roc_auc value: [0.97959184 0.98265306 0.97755102 0.97857143 0.97959184 0.98065173 0.98065173 0.98472505 0.97759674 0.98268839] mean value: 0.9804272829294651 key: test_jcc value: [0.94736842 0.93103448 1. 0.98181818 0.96428571 0.94827586 0.91666667 0.94827586 0.98214286 0.91666667] mean value: 0.953653471452927 key: train_jcc value: [0.96086106 0.96653543 0.95711501 0.95898438 0.96086106 0.96267191 0.96267191 0.97029703 0.95703125 0.96646943] mean value: 0.9623498450426142 MCC on Blind test: 0.3 Accuracy on Blind test: 0.61 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01922441 0.01973104 0.0211575 0.02013302 0.02118015 0.01990414 0.01957083 0.02115703 0.01995182 0.02084684] mean value: 0.020285677909851075 key: score_time value: [0.01134562 0.01130247 0.01128507 0.01133657 0.01124454 0.0112586 0.01131725 0.01126814 0.01121068 0.01123452] mean value: 0.011280345916748046 key: test_mcc value: [0.70710678 0.4472136 1. 0.70710678 0.70710678 1. 0.70710678 1. 0.33333333 0.33333333] mean value: 0.6942307386912815 key: train_mcc value: [0.96362411 1. 0.96362411 0.96362411 1. 0.96362411 0.96362411 0.96362411 0.96362411 1. ] mean value: 0.9745368781616021 key: test_accuracy value: [0.83333333 0.66666667 1. 0.83333333 0.83333333 1. 0.83333333 1. 0.66666667 0.66666667] mean value: 0.8333333333333334 key: train_accuracy value: [0.98148148 1. 0.98148148 0.98148148 1. 0.98148148 0.98148148 0.98148148 0.98148148 1. ] mean value: 0.987037037037037 key: test_fscore value: [0.8 0.75 1. 0.8 0.8 1. 0.85714286 1. 0.66666667 0.66666667] mean value: 0.834047619047619 key: train_fscore value: [0.98181818 1. 0.98181818 0.98181818 1. 0.98181818 0.98181818 0.98181818 0.98181818 1. ] mean value: 0.9872727272727273 key: test_precision value: [1. 0.6 1. 1. 1. 1. 0.75 1. 0.66666667 0.66666667] mean value: 0.8683333333333333 key: train_precision value: [0.96428571 1. 0.96428571 0.96428571 1. 0.96428571 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.975 key: test_recall value: [0.66666667 1. 1. 0.66666667 0.66666667 1. 1. 1. 0.66666667 0.66666667] mean value: 0.8333333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.66666667 1. 0.83333333 0.83333333 1. 0.83333333 1. 0.66666667 0.66666667] mean value: 0.8333333333333334 key: train_roc_auc value: [0.98148148 1. 0.98148148 0.98148148 1. 0.98148148 0.98148148 0.98148148 0.98148148 1. ] mean value: 0.987037037037037 key: test_jcc value: [0.66666667 0.6 1. 0.66666667 0.66666667 1. 0.75 1. 0.5 0.5 ] mean value: 0.735 key: train_jcc value: [0.96428571 1. 0.96428571 0.96428571 1. 0.96428571 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.975 Traceback (most recent call last): File "/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_rt.py", line 165, in mm_skf_scoresD4 = MultModelsCl(input_df = X_rus File "/home/tanu/git/LSHTM_analysis/scripts/ml/MultModelsCl.py", line 240, in MultModelsCl bts_predict = model_pipeline.predict(blind_test_input_df) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 457, in predict Xt = transform.transform(Xt) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 746, in transform Xs = self._fit_transform( File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/compose/_column_transformer.py", line 604, in _fit_transform return Parallel(n_jobs=self.n_jobs)( File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 1046, in __call__ while self.dispatch_one_batch(iterator): File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch self._dispatch(tasks) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 779, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 208, in apply_async result = ImmediateResult(func) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 572, in __init__ self.results = batch() File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in __call__ return [func(*args, **kwargs) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/joblib/parallel.py", line 262, in return [func(*args, **kwargs) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 117, in __call__ return self.function(*args, **kwargs) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/pipeline.py", line 853, in _transform_one res = transformer.transform(X) File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 882, in transform X_int, X_mask = self._transform( File "/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 160, in _transform raise ValueError(msg) ValueError: Found unknown categories ['Other'] in column 5 during transform