/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_orig.py:550: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 1133 PASS: my_features_df and aa_df successfully combined nrows: 1133 ncols: 274 count of NULL values before imputation or_mychisq 339 log10_or_mychisq 339 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 169 No. of categorical features: 7 index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: True index: 2 ind: 3 Mask count check: True Original Data Counter({0: 282, 1: 275}) Data dim: (557, 176) ------------------------------------------------------------- Successfully split data: ORIGINAL training actual values: training set imputed values: blind test set Train data size: (557, 176) Test data size: (575, 176) y_train numbers: Counter({0: 282, 1: 275}) y_train ratio: 1.0254545454545454 y_test_numbers: Counter({0: 545, 1: 30}) y_test ratio: 18.166666666666668 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 282, 1: 282}) (564, 176) Simple Random UnderSampling Counter({0: 275, 1: 275}) (550, 176) Simple Combined Over and UnderSampling Counter({0: 282, 1: 282}) (564, 176) SMOTE_NC OverSampling Counter({0: 282, 1: 282}) (564, 176) ##################################################################### Running ML analysis: ORIGINAL Gene name: rpoB Drug name: rifampicin Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_orig/ Sanity checks: Total input features: 176 Training data size: (557, 176) Test data size: (575, 176) Target feature numbers (training data): Counter({0: 282, 1: 275}) Target features ratio (training data: 1.0254545454545454 Target feature numbers (test data): Counter({0: 545, 1: 30}) Target features ratio (test data): 18.166666666666668 ##################################################################### ================================================================ Strucutral features (n): 37 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03571892 0.03646374 0.03661752 0.03673768 0.03027439 0.03322029 0.03499603 0.03590631 0.06422377 0.04940248] mean value: 0.039356112480163574 key: score_time value: [0.01237988 0.01195502 0.01191425 0.01493478 0.01193166 0.01194549 0.01192307 0.01487398 0.01554179 0.01552892] mean value: 0.013292884826660157 key: test_mcc value: [0.96490128 0.78544061 0.89342711 0.71428571 0.67900461 0.89802651 0.71611487 0.78174603 0.82337971 0.8565805 ] mean value: 0.8112906954356885 key: train_mcc value: [0.85235948 0.86826252 0.8522816 0.85627063 0.87227261 0.85317946 0.85640062 0.86858794 0.86061924 0.85683896] mean value: 0.8597073065379107 key: test_accuracy value: [0.98214286 0.89285714 0.94642857 0.85714286 0.83928571 0.94642857 0.85714286 0.89090909 0.90909091 0.92727273] mean value: 0.9048701298701298 key: train_accuracy value: [0.9261477 0.93413174 0.9261477 0.92814371 0.93612774 0.9261477 0.92814371 0.93426295 0.93027888 0.92828685] mean value: 0.9297818705219044 key: test_fscore value: [0.98181818 0.88888889 0.94736842 0.85714286 0.84210526 0.94339623 0.86206897 0.88888889 0.9122807 0.92307692] mean value: 0.9047035317712988 key: train_fscore value: [0.9258517 0.93360161 0.92525253 0.92682927 0.93548387 0.92673267 0.92771084 0.93386774 0.92985972 0.92828685] mean value: 0.9293476801717994 key: test_precision value: [0.96428571 0.88888889 0.93103448 0.85714286 0.82758621 1. 0.83333333 0.88888889 0.86666667 0.96 ] mean value: 0.9017827038861521 key: train_precision value: [0.92031873 0.93172691 0.9233871 0.93061224 0.93172691 0.90697674 0.92031873 0.92828685 0.92430279 0.91732283] mean value: 0.9234979827398379 key: test_recall value: [1. 0.88888889 0.96428571 0.85714286 0.85714286 0.89285714 0.89285714 0.88888889 0.96296296 0.88888889] mean value: 0.9093915343915344 key: train_recall value: [0.93145161 0.93548387 0.92712551 0.92307692 0.93927126 0.94736842 0.93522267 0.93951613 0.93548387 0.93951613] mean value: 0.9353516390231161 key: test_roc_auc value: [0.98275862 0.89272031 0.94642857 0.85714286 0.83928571 0.94642857 0.85714286 0.89087302 0.91005291 0.9265873 ] mean value: 0.9049420726144864 key: train_roc_auc value: [0.92620011 0.9341451 0.92616118 0.92807389 0.93617106 0.92644012 0.92824126 0.93432499 0.93034036 0.92841948] mean value: 0.9298517555235091 key: test_jcc value: [0.96428571 0.8 0.9 0.75 0.72727273 0.89285714 0.75757576 0.8 0.83870968 0.85714286] mean value: 0.8287843876553554 key: train_jcc value: [0.8619403 0.8754717 0.86090226 0.86363636 0.87878788 0.86346863 0.86516854 0.87593985 0.86891386 0.866171 ] mean value: 0.8680400379715635 MCC on Blind test: 0.28 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.90763521 0.90286136 1.07441068 0.9007256 1.06581187 0.93487668 1.07855535 0.91051745 0.9207058 0.9194684 ] mean value: 0.9615568399429322 key: score_time value: [0.01460195 0.01202965 0.01527715 0.01221609 0.01532865 0.01546383 0.01225019 0.0153358 0.01221108 0.01221395] mean value: 0.013692831993103028 key: test_mcc value: [0.96490128 0.74984143 0.89342711 0.75047877 0.75047877 0.89802651 0.75047877 0.74569602 0.75033796 0.8565805 ] mean value: 0.8110247141410594 key: train_mcc value: [0.87624709 0.83235852 0.90022801 0.81633141 0.95211147 0.89227656 0.88021614 0.90436881 0.82867552 0.83278304] mean value: 0.8715596565224771 key: test_accuracy value: [0.98214286 0.875 0.94642857 0.875 0.875 0.94642857 0.875 0.87272727 0.87272727 0.92727273] mean value: 0.9047727272727273 key: train_accuracy value: [0.93812375 0.91616766 0.9500998 0.90818363 0.9760479 0.94610778 0.94011976 0.95219124 0.91434263 0.91633466] mean value: 0.9357718825297612 key: test_fscore value: [0.98181818 0.86792453 0.94736842 0.87719298 0.87272727 0.94339623 0.87719298 0.86792453 0.87719298 0.92307692] mean value: 0.9035815029062298 key: train_fscore value: [0.93762575 0.91566265 0.9490835 0.90688259 0.97560976 0.94567404 0.93927126 0.9516129 0.91348089 0.916 ] mean value: 0.9350903343239241 key: test_precision value: [0.96428571 0.88461538 0.93103448 0.86206897 0.88888889 1. 0.86206897 0.88461538 0.83333333 0.96 ] mean value: 0.9070911119531809 key: train_precision value: [0.93574297 0.912 0.95491803 0.90688259 0.97959184 0.94 0.93927126 0.9516129 0.91164659 0.90873016] mean value: 0.9340396335864323 key: test_recall value: [1. 0.85185185 0.96428571 0.89285714 0.85714286 0.89285714 0.89285714 0.85185185 0.92592593 0.88888889] mean value: 0.9018518518518519 key: train_recall value: [0.93951613 0.91935484 0.94331984 0.90688259 0.97165992 0.951417 0.93927126 0.9516129 0.91532258 0.9233871 ] mean value: 0.9361744155674546 key: test_roc_auc value: [0.98275862 0.87420179 0.94642857 0.875 0.875 0.94642857 0.875 0.8723545 0.87367725 0.9265873 ] mean value: 0.9047436599160738 key: train_roc_auc value: [0.93813751 0.91619916 0.95000638 0.9081657 0.97598744 0.94618094 0.94010807 0.9521844 0.9143542 0.91641796] mean value: 0.9357741767545031 key: test_jcc value: [0.96428571 0.76666667 0.9 0.78125 0.77419355 0.89285714 0.78125 0.76666667 0.78125 0.85714286] mean value: 0.8265562596006144 key: train_jcc value: [0.88257576 0.84444444 0.90310078 0.82962963 0.95238095 0.89694656 0.88549618 0.90769231 0.84074074 0.84501845] mean value: 0.8788025805933736 MCC on Blind test: 0.28 Accuracy on Blind test: 0.69 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0148561 0.01195788 0.01015806 0.01290393 0.01003218 0.01007318 0.01003528 0.01038003 0.01010847 0.01045537] mean value: 0.011096048355102538 key: score_time value: [0.01212811 0.01027489 0.00905752 0.00897837 0.00887227 0.00879979 0.00898051 0.00892806 0.00926328 0.00886559] mean value: 0.00941483974456787 key: test_mcc value: [0.79257331 0.44074684 0.55328334 0.64951905 0.3992747 0.60753044 0.73127242 0.52715278 0.78174603 0.78353876] mean value: 0.6266637668473235 key: train_mcc value: [0.65574845 0.66306345 0.63858187 0.68063475 0.65119268 0.6812805 0.66951249 0.68417806 0.6328508 0.68226627] mean value: 0.6639309307086467 key: test_accuracy value: [0.89285714 0.71428571 0.76785714 0.82142857 0.69642857 0.80357143 0.85714286 0.76363636 0.89090909 0.89090909] mean value: 0.8099025974025974 key: train_accuracy value: [0.8243513 0.82834331 0.81636727 0.83632735 0.81437126 0.83832335 0.83233533 0.83864542 0.812749 0.83665339] mean value: 0.8278466970441587 key: test_fscore value: [0.88 0.65217391 0.73469388 0.80769231 0.66666667 0.8 0.84 0.75471698 0.88888889 0.88461538] mean value: 0.7909448019589822 key: train_fscore value: [0.80786026 0.81304348 0.79912664 0.81938326 0.78220141 0.825054 0.81818182 0.82352941 0.79385965 0.81938326] mean value: 0.8101623177549878 key: test_precision value: [0.95652174 0.78947368 0.85714286 0.875 0.73913043 0.81481481 0.95454545 0.76923077 0.88888889 0.92 ] mean value: 0.8564748642746355 key: train_precision value: [0.88095238 0.88207547 0.86729858 0.89855072 0.92777778 0.88425926 0.87906977 0.8957346 0.87019231 0.90291262] mean value: 0.8888823486174054 key: test_recall value: [0.81481481 0.55555556 0.64285714 0.75 0.60714286 0.78571429 0.75 0.74074074 0.88888889 0.85185185] mean value: 0.7387566137566137 key: train_recall value: [0.74596774 0.75403226 0.74089069 0.75303644 0.67611336 0.77327935 0.76518219 0.76209677 0.72983871 0.75 ] mean value: 0.7450437508162466 key: test_roc_auc value: [0.89016603 0.70881226 0.76785714 0.82142857 0.69642857 0.80357143 0.85714286 0.76322751 0.89087302 0.89021164] mean value: 0.8089719029374202 key: train_roc_auc value: [0.82357676 0.82760901 0.81532723 0.83517964 0.81246613 0.83742708 0.83140999 0.8377413 0.81176975 0.83562992] mean value: 0.8268136808296788 key: test_jcc value: [0.78571429 0.48387097 0.58064516 0.67741935 0.5 0.66666667 0.72413793 0.60606061 0.8 0.79310345] mean value: 0.6617618421622871 key: train_jcc value: [0.67765568 0.68498168 0.66545455 0.69402985 0.64230769 0.70220588 0.69230769 0.7 0.65818182 0.69402985] mean value: 0.6811154694734589 MCC on Blind test: 0.33 Accuracy on Blind test: 0.75 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0104661 0.01030374 0.01144981 0.01024771 0.0105319 0.01100707 0.01029181 0.01024437 0.01042271 0.01044583] mean value: 0.010541105270385742 key: score_time value: [0.00899553 0.00887704 0.00901628 0.00890112 0.00991344 0.0087688 0.00883007 0.00883222 0.00892782 0.00884151] mean value: 0.00899038314819336 key: test_mcc value: [0.9284802 0.6431407 0.60753044 0.67900461 0.67900461 0.64450339 0.57142857 0.63745526 0.77174363 0.74603175] mean value: 0.690832314648529 key: train_mcc value: [0.74057724 0.74448553 0.69260708 0.7804155 0.740478 0.69270547 0.7484655 0.72133625 0.76096895 0.74917652] mean value: 0.7371216028225991 key: test_accuracy value: [0.96428571 0.82142857 0.80357143 0.83928571 0.83928571 0.82142857 0.78571429 0.81818182 0.87272727 0.87272727] mean value: 0.8438636363636364 key: train_accuracy value: [0.87025948 0.87225549 0.84630739 0.89021956 0.87025948 0.84630739 0.8742515 0.86055777 0.88047809 0.87450199] mean value: 0.8685398128046695 key: test_fscore value: [0.96296296 0.80769231 0.8 0.83636364 0.84210526 0.81481481 0.78571429 0.80769231 0.8852459 0.87272727] mean value: 0.8415318752764827 key: train_fscore value: [0.86973948 0.87096774 0.84253579 0.88798371 0.86761711 0.84188912 0.87169043 0.85655738 0.87951807 0.8742515 ] mean value: 0.8662750313964435 key: test_precision value: [0.96296296 0.84 0.81481481 0.85185185 0.82758621 0.84615385 0.78571429 0.84 0.79411765 0.85714286] mean value: 0.8420344472595993 key: train_precision value: [0.86454183 0.87096774 0.85123967 0.89344262 0.87295082 0.85416667 0.87704918 0.87083333 0.876 0.86561265] mean value: 0.8696804515198457 key: test_recall value: [0.96296296 0.77777778 0.78571429 0.82142857 0.85714286 0.78571429 0.78571429 0.77777778 1. 0.88888889] mean value: 0.8443121693121693 key: train_recall value: [0.875 0.87096774 0.8340081 0.88259109 0.86234818 0.82995951 0.86639676 0.84274194 0.88306452 0.88306452] mean value: 0.8630142353402116 key: test_roc_auc value: [0.9642401 0.81992337 0.80357143 0.83928571 0.83928571 0.82142857 0.78571429 0.81746032 0.875 0.87301587] mean value: 0.8438925378580551 key: train_roc_auc value: [0.87030632 0.87224276 0.84613791 0.89011444 0.87015047 0.84608212 0.87414326 0.86034735 0.88050864 0.87460312] mean value: 0.8684636394092362 key: test_jcc value: [0.92857143 0.67741935 0.66666667 0.71875 0.72727273 0.6875 0.64705882 0.67741935 0.79411765 0.77419355] mean value: 0.7298969551163574 key: train_jcc value: [0.76950355 0.77142857 0.72791519 0.7985348 0.76618705 0.72695035 0.77256318 0.74910394 0.78494624 0.77659574] mean value: 0.7643728616166219 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01013756 0.01020646 0.01003814 0.00957966 0.01008773 0.01100087 0.01095223 0.01078153 0.01096463 0.01088476] mean value: 0.010463356971740723 key: score_time value: [0.08086848 0.01246262 0.01641536 0.01239705 0.01289415 0.01338482 0.01392913 0.01344013 0.01498032 0.0151093 ] mean value: 0.020588135719299315 key: test_mcc value: [0.85880465 0.53486983 0.4645821 0.35805744 0.42857143 0.61065803 0.3992747 0.52935027 0.49137176 0.60268595] mean value: 0.527822615672422 key: train_mcc value: [0.66070893 0.7007593 0.66886394 0.69260708 0.71683257 0.67367686 0.71790128 0.71743162 0.67772756 0.67363991] mean value: 0.690014906297328 key: test_accuracy value: [0.92857143 0.76785714 0.73214286 0.67857143 0.71428571 0.80357143 0.69642857 0.76363636 0.74545455 0.8 ] mean value: 0.7630519480519481 key: train_accuracy value: [0.83033932 0.8502994 0.83433134 0.84630739 0.85828343 0.83632735 0.85828343 0.85856574 0.83864542 0.83665339] mean value: 0.8448036198519296 key: test_fscore value: [0.92307692 0.75471698 0.72727273 0.66666667 0.71428571 0.79245283 0.66666667 0.74509804 0.73076923 0.78431373] mean value: 0.7505319504764566 key: train_fscore value: [0.82688391 0.84662577 0.82886598 0.84253579 0.85360825 0.82845188 0.85115304 0.85420945 0.83298969 0.83127572] mean value: 0.8396599470532266 key: test_precision value: [0.96 0.76923077 0.74074074 0.69230769 0.71428571 0.84 0.73913043 0.79166667 0.76 0.83333333] mean value: 0.7840695351347525 key: train_precision value: [0.83539095 0.85892116 0.84453782 0.85123967 0.8697479 0.85714286 0.8826087 0.87029289 0.85232068 0.8487395 ] mean value: 0.857094210276311 key: test_recall value: [0.88888889 0.74074074 0.71428571 0.64285714 0.71428571 0.75 0.60714286 0.7037037 0.7037037 0.74074074] mean value: 0.7206349206349206 key: train_recall value: [0.81854839 0.83467742 0.81376518 0.8340081 0.83805668 0.80161943 0.82186235 0.83870968 0.81451613 0.81451613] mean value: 0.8230279482826173 key: test_roc_auc value: [0.92720307 0.76692209 0.73214286 0.67857143 0.71428571 0.80357143 0.69642857 0.76256614 0.74470899 0.7989418 ] mean value: 0.7625342090859333 key: train_roc_auc value: [0.83022281 0.85014503 0.83404795 0.84613791 0.85800472 0.83584909 0.85778157 0.85833122 0.83836043 0.83639192] mean value: 0.8445272634880454 key: test_jcc value: [0.85714286 0.60606061 0.57142857 0.5 0.55555556 0.65625 0.5 0.59375 0.57575758 0.64516129] mean value: 0.6061106456267746 key: train_jcc value: [0.70486111 0.73404255 0.70774648 0.72791519 0.74460432 0.70714286 0.74087591 0.74551971 0.71378092 0.71126761] mean value: 0.7237756661243875 MCC on Blind test: 0.25 Accuracy on Blind test: 0.68 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02742267 0.02210236 0.02209449 0.0219636 0.02208066 0.02287722 0.02279329 0.02308583 0.02478433 0.02252221] mean value: 0.023172664642333984 key: score_time value: [0.01376057 0.01196885 0.0119679 0.011971 0.01200175 0.01229978 0.01202559 0.01241159 0.01239586 0.01161957] mean value: 0.0122422456741333 key: test_mcc value: [0.89342711 0.82149863 0.89342711 0.71428571 0.71428571 0.85714286 0.71611487 0.71049701 0.75033796 0.82269299] mean value: 0.7893709978851686 key: train_mcc value: [0.78043222 0.79646836 0.79237674 0.80040802 0.80040802 0.78839993 0.80036023 0.80483837 0.80476867 0.78902126] mean value: 0.7957481809735135 key: test_accuracy value: [0.94642857 0.91071429 0.94642857 0.85714286 0.85714286 0.92857143 0.85714286 0.85454545 0.87272727 0.90909091] mean value: 0.8939935064935065 key: train_accuracy value: [0.89021956 0.89820359 0.89620758 0.9001996 0.9001996 0.89421158 0.9001996 0.90239044 0.90239044 0.89442231] mean value: 0.8978644305015467 key: test_fscore value: [0.94545455 0.90566038 0.94736842 0.85714286 0.85714286 0.92857143 0.86206897 0.84615385 0.87719298 0.90196078] mean value: 0.8928717065163764 key: train_fscore value: [0.88933602 0.89779559 0.89430894 0.89919355 0.89919355 0.89292929 0.89878543 0.90180361 0.90140845 0.89421158] mean value: 0.8968965999938038 key: test_precision value: [0.92857143 0.92307692 0.93103448 0.85714286 0.85714286 0.92857143 0.83333333 0.88 0.83333333 0.95833333] mean value: 0.8930539977264116 key: train_precision value: [0.8875502 0.89243028 0.89795918 0.89558233 0.89558233 0.89112903 0.89878543 0.89641434 0.89959839 0.88537549] mean value: 0.8940407009629887 key: test_recall value: [0.96296296 0.88888889 0.96428571 0.85714286 0.85714286 0.92857143 0.89285714 0.81481481 0.92592593 0.85185185] mean value: 0.8944444444444444 key: train_recall value: [0.89112903 0.90322581 0.89068826 0.90283401 0.90283401 0.89473684 0.89878543 0.90725806 0.90322581 0.90322581] mean value: 0.8997943058639154 key: test_roc_auc value: [0.94699872 0.90996169 0.94642857 0.85714286 0.85714286 0.92857143 0.85714286 0.85383598 0.87367725 0.90806878] mean value: 0.8938970990695129 key: train_roc_auc value: [0.89022855 0.89825322 0.89613153 0.9002359 0.9002359 0.89421881 0.90018011 0.90244793 0.9024003 0.89452629] mean value: 0.8978858554311018 key: test_jcc value: [0.89655172 0.82758621 0.9 0.75 0.75 0.86666667 0.75757576 0.73333333 0.78125 0.82142857] mean value: 0.8084392260038812 key: train_jcc value: [0.80072464 0.81454545 0.80882353 0.81684982 0.81684982 0.80656934 0.81617647 0.82116788 0.82051282 0.80866426] mean value: 0.8130884032644238 MCC on Blind test: 0.23 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.94465923 2.00514841 2.06666064 2.05695105 2.04768014 1.9460156 2.0034411 1.29994416 2.12981367 2.01057458] mean value: 1.9510888576507568 key: score_time value: [0.01509047 0.01248336 0.02142859 0.01991558 0.02001476 0.01998496 0.0137403 0.01238942 0.01494431 0.02116203] mean value: 0.01711537837982178 key: test_mcc value: [0.82195294 0.71392082 0.85933785 0.60753044 0.67900461 0.83484711 0.71428571 0.78174603 0.82337971 0.8565805 ] mean value: 0.7692585719374775 key: train_mcc value: [0.98803016 0.99601537 0.99601492 1. 0.99601492 0.99601492 0.99204516 0.94820483 0.98409121 0.9841835 ] mean value: 0.9880614980481577 key: test_accuracy value: [0.91071429 0.85714286 0.92857143 0.80357143 0.83928571 0.91071429 0.85714286 0.89090909 0.90909091 0.92727273] mean value: 0.8834415584415585 key: train_accuracy value: [0.99401198 0.99800399 0.99800399 1. 0.99800399 0.99800399 0.99600798 0.97410359 0.99203187 0.99203187] mean value: 0.9940203258821003 key: test_fscore value: [0.90909091 0.85185185 0.93103448 0.80701754 0.84210526 0.90196078 0.85714286 0.88888889 0.9122807 0.92307692] mean value: 0.8824450205895706 key: train_fscore value: [0.99393939 0.9979798 0.9979716 1. 0.9979716 0.9979716 0.99593496 0.97373737 0.99190283 0.99186992] mean value: 0.9939279085015674 key: test_precision value: [0.89285714 0.85185185 0.9 0.79310345 0.82758621 1. 0.85714286 0.88888889 0.86666667 0.96 ] mean value: 0.8838097062579822 key: train_precision value: [0.99595142 1. 1. 1. 1. 1. 1. 0.9757085 0.99593496 1. ] mean value: 0.9967594878377933 key: test_recall value: [0.92592593 0.85185185 0.96428571 0.82142857 0.85714286 0.82142857 0.85714286 0.88888889 0.96296296 0.88888889] mean value: 0.883994708994709 key: train_recall value: [0.99193548 0.99596774 0.99595142 1. 0.99595142 0.99595142 0.99190283 0.97177419 0.98790323 0.98387097] mean value: 0.9911208697923468 key: test_roc_auc value: [0.91123883 0.85696041 0.92857143 0.80357143 0.83928571 0.91071429 0.85714286 0.89087302 0.91005291 0.9265873 ] mean value: 0.8834998175515417 key: train_roc_auc value: [0.99399146 0.99798387 0.99797571 1. 0.99797571 0.99797571 0.99595142 0.97407607 0.99198311 0.99193548] mean value: 0.9939848536817699 key: test_jcc value: [0.83333333 0.74193548 0.87096774 0.67647059 0.72727273 0.82142857 0.75 0.8 0.83870968 0.85714286] mean value: 0.791726098063859 key: train_jcc value: [0.98795181 0.99596774 0.99595142 1. 0.99595142 0.99595142 0.99190283 0.9488189 0.98393574 0.98387097] mean value: 0.9880302242536261 MCC on Blind test: 0.25 Accuracy on Blind test: 0.63 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03864717 0.02425408 0.02131462 0.02202463 0.01960993 0.02493072 0.02551031 0.02189636 0.0220654 0.02170563] mean value: 0.02419588565826416 key: score_time value: [0.00941229 0.00976038 0.00921798 0.00955081 0.00952029 0.00929213 0.00879693 0.00893784 0.00880909 0.00895596] mean value: 0.00922536849975586 key: test_mcc value: [0.96481304 0.89342711 0.82618439 0.85933785 0.85714286 0.82195294 0.85933785 0.85695439 1. 0.78961518] mean value: 0.8728765613403809 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98214286 0.94642857 0.91071429 0.92857143 0.92857143 0.91071429 0.92857143 0.92727273 1. 0.89090909] mean value: 0.9353896103896104 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98113208 0.94545455 0.90566038 0.93103448 0.92857143 0.9122807 0.93103448 0.92857143 1. 0.88 ] mean value: 0.9343739522699218 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.92857143 0.96 0.9 0.92857143 0.89655172 0.9 0.89655172 1. 0.95652174] mean value: 0.9366768044549154 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 0.96296296 0.85714286 0.96428571 0.92857143 0.92857143 0.96428571 0.96296296 1. 0.81481481] mean value: 0.9346560846560846 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98148148 0.94699872 0.91071429 0.92857143 0.92857143 0.91071429 0.92857143 0.92791005 1. 0.88955026] mean value: 0.9353083378945448 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96296296 0.89655172 0.82758621 0.87096774 0.86666667 0.83870968 0.87096774 0.86666667 1. 0.78571429] mean value: 0.8786793674335387 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.47 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.13840342 0.1259439 0.12937999 0.12996531 0.12474942 0.13039589 0.1298151 0.12556314 0.12853265 0.12834382] mean value: 0.12910926342010498 key: score_time value: [0.01946807 0.01797915 0.01792121 0.0182476 0.01871538 0.017977 0.01843834 0.01799345 0.01811767 0.01798201] mean value: 0.018283987045288087 key: test_mcc value: [0.9284802 0.60652703 0.85933785 0.75047877 0.75047877 0.85714286 0.75047877 0.78174603 0.81878307 0.81854376] mean value: 0.7921997127312963 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96428571 0.80357143 0.92857143 0.875 0.875 0.92857143 0.875 0.89090909 0.90909091 0.90909091] mean value: 0.8959090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 0.79245283 0.92592593 0.87719298 0.87719298 0.92857143 0.87719298 0.88888889 0.90909091 0.90566038] mean value: 0.8945132270355706 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 0.80769231 0.96153846 0.86206897 0.86206897 0.92857143 0.86206897 0.88888889 0.89285714 0.92307692] mean value: 0.8951795012139839 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 0.77777778 0.89285714 0.89285714 0.89285714 0.92857143 0.89285714 0.88888889 0.92592593 0.88888889] mean value: 0.8944444444444445 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9642401 0.80268199 0.92857143 0.875 0.875 0.92857143 0.875 0.89087302 0.90939153 0.90873016] mean value: 0.8958059660645867 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 0.65625 0.86206897 0.78125 0.78125 0.86666667 0.78125 0.8 0.83333333 0.82758621] mean value: 0.8118226600985222 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.33 Accuracy on Blind test: 0.71 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01071095 0.01106501 0.01063919 0.01035523 0.01046157 0.01181698 0.01045609 0.01055861 0.01104879 0.01108813] mean value: 0.01082005500793457 key: score_time value: [0.00896955 0.00953364 0.00946403 0.00884151 0.00904202 0.00962973 0.00890636 0.00897074 0.00888109 0.00932431] mean value: 0.009156298637390137 key: test_mcc value: [0.7549598 0.18170219 0.50128041 0.4330127 0.53881591 0.40574111 0.75434227 0.56441351 0.53121272 0.60876172] mean value: 0.5274242337751618 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 0.58928571 0.75 0.71428571 0.76785714 0.69642857 0.875 0.78181818 0.76363636 0.8 ] mean value: 0.7613311688311688 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.87719298 0.59649123 0.74074074 0.69230769 0.77966102 0.73015873 0.86792453 0.76923077 0.77192982 0.7755102 ] mean value: 0.7601147716858323 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 0.56666667 0.76923077 0.75 0.74193548 0.65714286 0.92 0.8 0.73333333 0.86363636] mean value: 0.7635278807214291 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92592593 0.62962963 0.71428571 0.64285714 0.82142857 0.82142857 0.82142857 0.74074074 0.81481481 0.7037037 ] mean value: 0.7636243386243386 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87675607 0.59067688 0.75 0.71428571 0.76785714 0.69642857 0.875 0.78108466 0.76455026 0.79828042] mean value: 0.7614919722678344 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.78125 0.425 0.58823529 0.52941176 0.63888889 0.575 0.76666667 0.625 0.62857143 0.63333333] mean value: 0.6191357376283847 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.23 Accuracy on Blind test: 0.65 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.87882328 1.90987062 1.84491491 1.94657969 1.94930863 1.92319202 1.93694496 1.91315603 1.92559934 1.88178062] mean value: 1.9110170125961303 key: score_time value: [0.09425449 0.09519291 0.09458399 0.10057092 0.09676242 0.0935111 0.09894633 0.09861422 0.09709573 0.10022926] mean value: 0.09697613716125489 key: test_mcc value: [1. 0.85951469 0.92857143 0.93094934 0.82195294 0.96490128 0.92857143 0.85449735 1. 0.89602867] mean value: 0.9184987133098356 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.92857143 0.96428571 0.96428571 0.91071429 0.98214286 0.96428571 0.92727273 1. 0.94545455] mean value: 0.9587012987012987 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.92857143 0.96428571 0.96551724 0.90909091 0.98181818 0.96428571 0.92592593 1. 0.94117647] mean value: 0.958067158594542 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.89655172 0.96428571 0.93333333 0.92592593 1. 0.96428571 0.92592593 1. 1. ] mean value: 0.9610308337894545 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96296296 0.96428571 1. 0.89285714 0.96428571 0.96428571 0.92592593 1. 0.88888889] mean value: 0.9563492063492064 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.92975734 0.96428571 0.96428571 0.91071429 0.98214286 0.96428571 0.92724868 1. 0.94444444] mean value: 0.9587164750957855 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.86666667 0.93103448 0.93333333 0.83333333 0.96428571 0.93103448 0.86206897 1. 0.88888889] mean value: 0.921064586754242 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.61 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.89666176 1.00527024 1.07393312 1.04313874 1.00363135 1.01725841 0.97884154 1.00802183 1.00800323 0.97835732] mean value: 1.1013117551803588 key: score_time value: [0.26713657 0.2600987 0.2608161 0.24226403 0.25155401 0.22661448 0.25301027 0.28414631 0.28104687 0.2326901 ] mean value: 0.2559377431869507 key: test_mcc value: [1. 0.74984143 0.92857143 0.93094934 0.82195294 0.85714286 0.92857143 0.85449735 1. 0.8565805 ] mean value: 0.8928107284174787 key: train_mcc value: [0.94410621 0.95608932 0.95212215 0.95608442 0.96010711 0.9441372 0.94410086 0.95617454 0.94820977 0.9562436 ] mean value: 0.9517375181782505 key: test_accuracy value: [1. 0.875 0.96428571 0.96428571 0.91071429 0.92857143 0.96428571 0.92727273 1. 0.92727273] mean value: 0.9461688311688312 key: train_accuracy value: [0.97205589 0.97804391 0.9760479 0.97804391 0.98003992 0.97205589 0.97205589 0.97808765 0.97410359 0.97808765] mean value: 0.9758622197835405 key: test_fscore value: [1. 0.86792453 0.96428571 0.96551724 0.90909091 0.92857143 0.96428571 0.92592593 1. 0.92307692] mean value: 0.9448678384917812 key: train_fscore value: [0.97177419 0.97777778 0.97580645 0.97768763 0.97983871 0.97177419 0.97165992 0.97777778 0.97384306 0.97795591] mean value: 0.9755895619919588 key: test_precision value: [1. 0.88461538 0.96428571 0.93333333 0.92592593 0.92857143 0.96428571 0.92592593 1. 0.96 ] mean value: 0.9486943426943427 key: train_precision value: [0.97177419 0.97975709 0.97188755 0.9796748 0.97590361 0.96787149 0.97165992 0.97975709 0.97188755 0.97211155] mean value: 0.9742284833953254 key: test_recall value: [1. 0.85185185 0.96428571 1. 0.89285714 0.92857143 0.96428571 0.92592593 1. 0.88888889] mean value: 0.9416666666666667 key: train_recall value: [0.97177419 0.97580645 0.97975709 0.9757085 0.98380567 0.9757085 0.97165992 0.97580645 0.97580645 0.98387097] mean value: 0.9769704192242392 key: test_roc_auc value: [1. 0.87420179 0.96428571 0.96428571 0.91071429 0.92857143 0.96428571 0.92724868 1. 0.9265873 ] mean value: 0.9460180623973727 key: train_roc_auc value: [0.9720531 0.9780218 0.97609901 0.97801173 0.98009181 0.97210622 0.97205043 0.97806071 0.9741237 0.97815596] mean value: 0.9758774476377025 key: test_jcc value: [1. 0.76666667 0.93103448 0.93333333 0.83333333 0.86666667 0.93103448 0.86206897 1. 0.85714286] mean value: 0.8981280788177339 key: train_jcc value: [0.94509804 0.95652174 0.95275591 0.95634921 0.96047431 0.94509804 0.94488189 0.95652174 0.94901961 0.95686275] mean value: 0.9523583219558611 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02486706 0.01028371 0.01111889 0.01084995 0.01055169 0.01073956 0.01036692 0.01024008 0.01063538 0.01101923] mean value: 0.012067246437072753 key: score_time value: [0.01320982 0.00907779 0.00915051 0.00921273 0.00907063 0.00885177 0.00918436 0.00913143 0.00959134 0.00949717] mean value: 0.00959775447845459 key: test_mcc value: [0.9284802 0.6431407 0.60753044 0.67900461 0.67900461 0.64450339 0.57142857 0.63745526 0.77174363 0.74603175] mean value: 0.690832314648529 key: train_mcc value: [0.74057724 0.74448553 0.69260708 0.7804155 0.740478 0.69270547 0.7484655 0.72133625 0.76096895 0.74917652] mean value: 0.7371216028225991 key: test_accuracy value: [0.96428571 0.82142857 0.80357143 0.83928571 0.83928571 0.82142857 0.78571429 0.81818182 0.87272727 0.87272727] mean value: 0.8438636363636364 key: train_accuracy value: [0.87025948 0.87225549 0.84630739 0.89021956 0.87025948 0.84630739 0.8742515 0.86055777 0.88047809 0.87450199] mean value: 0.8685398128046695 key: test_fscore value: [0.96296296 0.80769231 0.8 0.83636364 0.84210526 0.81481481 0.78571429 0.80769231 0.8852459 0.87272727] mean value: 0.8415318752764827 key: train_fscore value: [0.86973948 0.87096774 0.84253579 0.88798371 0.86761711 0.84188912 0.87169043 0.85655738 0.87951807 0.8742515 ] mean value: 0.8662750313964435 key: test_precision value: [0.96296296 0.84 0.81481481 0.85185185 0.82758621 0.84615385 0.78571429 0.84 0.79411765 0.85714286] mean value: 0.8420344472595993 key: train_precision value: [0.86454183 0.87096774 0.85123967 0.89344262 0.87295082 0.85416667 0.87704918 0.87083333 0.876 0.86561265] mean value: 0.8696804515198457 key: test_recall value: [0.96296296 0.77777778 0.78571429 0.82142857 0.85714286 0.78571429 0.78571429 0.77777778 1. 0.88888889] mean value: 0.8443121693121693 key: train_recall value: [0.875 0.87096774 0.8340081 0.88259109 0.86234818 0.82995951 0.86639676 0.84274194 0.88306452 0.88306452] mean value: 0.8630142353402116 key: test_roc_auc value: [0.9642401 0.81992337 0.80357143 0.83928571 0.83928571 0.82142857 0.78571429 0.81746032 0.875 0.87301587] mean value: 0.8438925378580551 key: train_roc_auc value: [0.87030632 0.87224276 0.84613791 0.89011444 0.87015047 0.84608212 0.87414326 0.86034735 0.88050864 0.87460312] mean value: 0.8684636394092362 key: test_jcc value: [0.92857143 0.67741935 0.66666667 0.71875 0.72727273 0.6875 0.64705882 0.67741935 0.79411765 0.77419355] mean value: 0.7298969551163574 key: train_jcc value: [0.76950355 0.77142857 0.72791519 0.7985348 0.76618705 0.72695035 0.77256318 0.74910394 0.78494624 0.77659574] mean value: 0.7643728616166219 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.12475324 0.08080173 0.08375597 0.08054066 0.08045411 0.09029341 0.08177567 0.07712722 0.07821131 0.07184291] mean value: 0.08495562076568604 key: score_time value: [0.01098704 0.01110721 0.01107979 0.01108408 0.01103806 0.01121902 0.01095128 0.01111293 0.01077819 0.01080561] mean value: 0.011016321182250977 key: test_mcc value: [1. 0.89342711 0.89342711 0.93094934 0.89342711 0.96490128 0.96490128 0.89153439 1. 0.89139151] mean value: 0.932395913795342 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.94642857 0.94642857 0.96428571 0.94642857 0.98214286 0.98214286 0.94545455 1. 0.94545455] mean value: 0.9658766233766234 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94545455 0.94545455 0.96551724 0.94736842 0.98181818 0.98245614 0.94545455 1. 0.94339623] mean value: 0.9656919847379731 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.92857143 0.96296296 0.93333333 0.93103448 1. 0.96551724 0.92857143 1. 0.96153846] mean value: 0.9611529339115547 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96296296 0.92857143 1. 0.96428571 0.96428571 1. 0.96296296 1. 0.92592593] mean value: 0.9708994708994709 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.94699872 0.94642857 0.96428571 0.94642857 0.98214286 0.98214286 0.9457672 1. 0.94510582] mean value: 0.965930031016238 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.89655172 0.89655172 0.93333333 0.9 0.96428571 0.96551724 0.89655172 1. 0.89285714] mean value: 0.9345648604269294 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.35 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05492663 0.05176282 0.10891724 0.06809163 0.04307985 0.04353976 0.07512021 0.06407142 0.07399178 0.08978105] mean value: 0.06732823848724365 key: score_time value: [0.01222754 0.0186336 0.01953959 0.01344514 0.01201749 0.01820254 0.01937747 0.01236653 0.01234722 0.02103639] mean value: 0.01591935157775879 key: test_mcc value: [0.72273097 0.78544061 0.85714286 0.75047877 0.78772636 0.82618439 0.71611487 0.74569602 0.71735629 0.8565805 ] mean value: 0.7765451645733424 key: train_mcc value: [0.89240405 0.89647918 0.89632475 0.89622747 0.91623104 0.9124496 0.90430958 0.90450187 0.90039607 0.89261761] mean value: 0.9011941219984033 key: test_accuracy value: [0.85714286 0.89285714 0.92857143 0.875 0.89285714 0.91071429 0.85714286 0.87272727 0.85454545 0.92727273] mean value: 0.8868831168831168 key: train_accuracy value: [0.94610778 0.94810379 0.94810379 0.94810379 0.95808383 0.95608782 0.95209581 0.95219124 0.9501992 0.94621514] mean value: 0.950529220443575 key: test_fscore value: [0.86206897 0.88888889 0.92857143 0.87719298 0.89655172 0.90566038 0.86206897 0.86792453 0.86206897 0.92307692] mean value: 0.8874073749343413 key: train_fscore value: [0.94610778 0.94820717 0.94779116 0.94758065 0.95774648 0.956 0.95180723 0.952 0.94969819 0.94610778] mean value: 0.9503046446920652 key: test_precision value: [0.80645161 0.88888889 0.92857143 0.86206897 0.86666667 0.96 0.83333333 0.88461538 0.80645161 0.96 ] mean value: 0.8797047893399395 key: train_precision value: [0.93675889 0.93700787 0.94023904 0.9437751 0.952 0.94466403 0.94422311 0.94444444 0.94779116 0.93675889] mean value: 0.9427662553096674 key: test_recall value: [0.92592593 0.88888889 0.92857143 0.89285714 0.92857143 0.85714286 0.89285714 0.85185185 0.92592593 0.88888889] mean value: 0.8981481481481481 key: train_recall value: [0.95564516 0.95967742 0.95546559 0.951417 0.96356275 0.96761134 0.95951417 0.95967742 0.9516129 0.95564516] mean value: 0.9579828914718558 key: test_roc_auc value: [0.85951469 0.89272031 0.92857143 0.875 0.89285714 0.91071429 0.85714286 0.8723545 0.85582011 0.9265873 ] mean value: 0.8871282612661924 key: train_roc_auc value: [0.94620203 0.94821816 0.94820523 0.94814945 0.95815933 0.95624661 0.95219803 0.95227965 0.9502159 0.94632652] mean value: 0.950620090969503 key: test_jcc value: [0.75757576 0.8 0.86666667 0.78125 0.8125 0.82758621 0.75757576 0.76666667 0.75757576 0.85714286] mean value: 0.7984539670100015 key: train_jcc value: [0.89772727 0.90151515 0.90076336 0.90038314 0.91891892 0.91570881 0.90804598 0.90839695 0.90421456 0.89772727] mean value: 0.9053401411653583 MCC on Blind test: 0.22 Accuracy on Blind test: 0.67 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01412702 0.01368785 0.01042676 0.01001048 0.00986242 0.00989866 0.00987029 0.0099175 0.01003695 0.00978541] mean value: 0.010762333869934082 key: score_time value: [0.0248754 0.00975728 0.00907731 0.00859451 0.00872374 0.00872278 0.00863624 0.00862908 0.00866938 0.00869274] mean value: 0.010437846183776855 key: test_mcc value: [0.89342711 0.6431407 0.68965631 0.68250015 0.60753044 0.67900461 0.64450339 0.67328042 0.71735629 0.81854376] mean value: 0.7048943172837225 key: train_mcc value: [0.75247462 0.72067111 0.6806649 0.76441802 0.73651066 0.72061769 0.7364755 0.74502957 0.74898578 0.76490153] mean value: 0.7370749397486276 key: test_accuracy value: [0.94642857 0.82142857 0.83928571 0.83928571 0.80357143 0.83928571 0.82142857 0.83636364 0.85454545 0.90909091] mean value: 0.8510714285714286 key: train_accuracy value: [0.8762475 0.86027944 0.84031936 0.88223553 0.86826347 0.86027944 0.86826347 0.87250996 0.87450199 0.88247012] mean value: 0.8685370295266042 key: test_fscore value: [0.94545455 0.80769231 0.82352941 0.83018868 0.8 0.83636364 0.81481481 0.83636364 0.86206897 0.90566038] mean value: 0.8462136374574661 key: train_fscore value: [0.87449393 0.85714286 0.83606557 0.88032454 0.86530612 0.85655738 0.86639676 0.8699187 0.87221095 0.88080808] mean value: 0.8659224895623094 key: test_precision value: [0.92857143 0.84 0.91304348 0.88 0.81481481 0.85185185 0.84615385 0.82142857 0.80645161 0.92307692] mean value: 0.8625392527061531 key: train_precision value: [0.87804878 0.8677686 0.84647303 0.88211382 0.87242798 0.86721992 0.86639676 0.87704918 0.87755102 0.88259109] mean value: 0.8717640181251569 key: test_recall value: [0.96296296 0.77777778 0.75 0.78571429 0.78571429 0.82142857 0.78571429 0.85185185 0.92592593 0.88888889] mean value: 0.8335978835978836 key: train_recall value: [0.87096774 0.84677419 0.82591093 0.87854251 0.8582996 0.84615385 0.86639676 0.86290323 0.86693548 0.87903226] mean value: 0.8601916546950503 key: test_roc_auc value: [0.94699872 0.81992337 0.83928571 0.83928571 0.80357143 0.83928571 0.82142857 0.83664021 0.85582011 0.90873016] mean value: 0.851096971355592 key: train_roc_auc value: [0.87619533 0.86014599 0.84012082 0.88218464 0.86812618 0.8600848 0.86823775 0.87239649 0.87441262 0.88242951] mean value: 0.86843341410175 key: test_jcc value: [0.89655172 0.67741935 0.7 0.70967742 0.66666667 0.71875 0.6875 0.71875 0.75757576 0.82758621] mean value: 0.7360477129470455 key: train_jcc value: [0.77697842 0.75 0.71830986 0.78623188 0.76258993 0.74910394 0.76428571 0.76978417 0.77338129 0.78700361] mean value: 0.7637668823208889 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01758575 0.0198431 0.02237058 0.02029872 0.0240736 0.02260089 0.01758718 0.01784587 0.01909232 0.01727676] mean value: 0.01985747814178467 key: score_time value: [0.00994825 0.01115131 0.01172638 0.01171517 0.01169991 0.0116601 0.01156712 0.01167297 0.0117054 0.01167989] mean value: 0.011452651023864746 key: test_mcc value: [0.70299234 0.78799489 0.85714286 0.71428571 0.67900461 0.92857143 0.71659857 0.60876172 0.82337971 0.8565805 ] mean value: 0.7675312343241391 key: train_mcc value: [0.8286992 0.86914588 0.90049318 0.88030173 0.91678491 0.85174413 0.78363444 0.84370235 0.86453703 0.86480823] mean value: 0.8603851093775248 key: test_accuracy value: [0.83928571 0.89285714 0.92857143 0.85714286 0.83928571 0.96428571 0.83928571 0.8 0.90909091 0.92727273] mean value: 0.8797077922077923 key: train_accuracy value: [0.91017964 0.93413174 0.9500998 0.94011976 0.95808383 0.9241517 0.88423154 0.92031873 0.93227092 0.93227092] mean value: 0.928585856176094 key: test_fscore value: [0.85245902 0.89285714 0.92857143 0.85714286 0.84210526 0.96428571 0.80851064 0.7755102 0.9122807 0.92307692] mean value: 0.8756799889619294 key: train_fscore value: [0.91525424 0.93491124 0.9486653 0.93877551 0.9582505 0.92635659 0.86936937 0.91561181 0.93117409 0.93227092] mean value: 0.9270639563121068 key: test_precision value: [0.76470588 0.86206897 0.92857143 0.85714286 0.82758621 0.96428571 1. 0.86363636 0.86666667 0.96 ] mean value: 0.8894664085069764 key: train_precision value: [0.85865724 0.91505792 0.9625 0.94650206 0.94140625 0.88847584 0.97969543 0.96017699 0.93495935 0.92125984] mean value: 0.930869091765427 key: test_recall value: [0.96296296 0.92592593 0.92857143 0.85714286 0.85714286 0.96428571 0.67857143 0.7037037 0.96296296 0.88888889] mean value: 0.873015873015873 key: train_recall value: [0.97983871 0.95564516 0.93522267 0.93117409 0.9757085 0.96761134 0.78137652 0.875 0.92741935 0.94354839] mean value: 0.9272544730312133 key: test_roc_auc value: [0.84355045 0.89399745 0.92857143 0.85714286 0.83928571 0.96428571 0.83928571 0.79828042 0.91005291 0.9265873 ] mean value: 0.880103995621237 key: train_roc_auc value: [0.91086797 0.93434432 0.9498948 0.93999649 0.95832669 0.92475055 0.88281424 0.91978346 0.93221361 0.93240411] mean value: 0.9285396264194379 key: test_jcc value: [0.74285714 0.80645161 0.86666667 0.75 0.72727273 0.93103448 0.67857143 0.63333333 0.83870968 0.85714286] mean value: 0.7832039928925357 key: train_jcc value: [0.84375 0.87777778 0.90234375 0.88461538 0.91984733 0.86281588 0.7689243 0.84435798 0.87121212 0.87313433] mean value: 0.8648778854126843 MCC on Blind test: 0.26 Accuracy on Blind test: 0.6 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02320075 0.02007747 0.02243185 0.02290845 0.0215466 0.01983714 0.02546191 0.01764059 0.0193162 0.01997018] mean value: 0.021239113807678223 key: score_time value: [0.01175237 0.01165366 0.01168561 0.01169753 0.01166391 0.0116353 0.01183295 0.01159787 0.01166773 0.0116539 ] mean value: 0.011684083938598632 key: test_mcc value: [0.96490128 0.66026156 0.72168784 0.82195294 0.49487166 0.89802651 0.70082556 0.60242771 0.79069197 0.8565805 ] mean value: 0.7512227529393891 key: train_mcc value: [0.88154038 0.81002887 0.87979163 0.90022801 0.65420186 0.88429173 0.84429267 0.7060842 0.89261761 0.89653312] mean value: 0.8349610067434581 key: test_accuracy value: [0.98214286 0.82142857 0.85714286 0.91071429 0.71428571 0.94642857 0.83928571 0.78181818 0.89090909 0.92727273] mean value: 0.8671428571428571 key: train_accuracy value: [0.94011976 0.9001996 0.93812375 0.9500998 0.80239521 0.94211577 0.91816367 0.83466135 0.94621514 0.94820717] mean value: 0.9120301230208905 key: test_fscore value: [0.98181818 0.83333333 0.84615385 0.9122807 0.61904762 0.94339623 0.81632653 0.72727273 0.89655172 0.92307692] mean value: 0.8499257813622287 key: train_fscore value: [0.93775934 0.90636704 0.93418259 0.9490835 0.75062972 0.9416499 0.91067538 0.8 0.94610778 0.948 ] mean value: 0.902445525859967 key: test_precision value: [0.96428571 0.75757576 0.91666667 0.89655172 0.92857143 1. 0.95238095 0.94117647 0.83870968 0.96 ] mean value: 0.9155918391626041 key: train_precision value: [0.96581197 0.84615385 0.98214286 0.95491803 0.99333333 0.936 0.98584906 0.99401198 0.93675889 0.94047619] mean value: 0.9535456151637388 key: test_recall value: [1. 0.92592593 0.78571429 0.92857143 0.46428571 0.89285714 0.71428571 0.59259259 0.96296296 0.88888889] mean value: 0.8156084656084656 key: train_recall value: [0.91129032 0.97580645 0.89068826 0.94331984 0.60323887 0.94736842 0.84615385 0.66935484 0.95564516 0.95564516] mean value: 0.8698511166253102 key: test_roc_auc value: [0.98275862 0.82503193 0.85714286 0.91071429 0.71428571 0.94642857 0.83928571 0.77843915 0.89219577 0.9265873 ] mean value: 0.8672869914249225 key: train_roc_auc value: [0.93983488 0.9009467 0.93747011 0.95000638 0.79965093 0.94218815 0.91717141 0.83270892 0.94632652 0.94829502] mean value: 0.9114599020928051 key: test_jcc value: [0.96428571 0.71428571 0.73333333 0.83870968 0.44827586 0.89285714 0.68965517 0.57142857 0.8125 0.85714286] mean value: 0.7522474045235447 key: train_jcc value: [0.8828125 0.82876712 0.87649402 0.90310078 0.60080645 0.88973384 0.836 0.66666667 0.89772727 0.90114068] mean value: 0.8283249338107523 MCC on Blind test: 0.19 Accuracy on Blind test: 0.45 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.20143199 0.18603349 0.18654966 0.18629313 0.18640566 0.18552709 0.1855998 0.18528461 0.18386316 0.18543124] mean value: 0.1872419834136963 key: score_time value: [0.01513767 0.01660371 0.01517773 0.01519632 0.01537299 0.01546884 0.01524711 0.01526022 0.01523232 0.0154779 ] mean value: 0.01541748046875 key: test_mcc value: [1. 0.89315584 0.92857143 0.89342711 0.78571429 0.93094934 0.93094934 0.89153439 1. 0.89139151] mean value: 0.9145693236924946 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.94642857 0.96428571 0.94642857 0.89285714 0.96428571 0.96428571 0.94545455 1. 0.94545455] mean value: 0.956948051948052 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94339623 0.96428571 0.94736842 0.89285714 0.96296296 0.96296296 0.94545455 1. 0.94339623] mean value: 0.9562684202406149 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96153846 0.96428571 0.93103448 0.89285714 1. 1. 0.92857143 1. 0.96153846] mean value: 0.9639825691549829 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.92592593 0.96428571 0.96428571 0.89285714 0.92857143 0.92857143 0.96296296 1. 0.92592593] mean value: 0.9493386243386244 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.94572158 0.96428571 0.94642857 0.89285714 0.96428571 0.96428571 0.9457672 1. 0.94510582] mean value: 0.9568737456668491 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.89285714 0.93103448 0.9 0.80645161 0.92857143 0.92857143 0.89655172 1. 0.89285714] mean value: 0.9176894962656921 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.43 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.07632256 0.06938648 0.06508446 0.06937695 0.0748291 0.08184791 0.06407475 0.05834961 0.05477071 0.06439114] mean value: 0.06784336566925049 key: score_time value: [0.02440453 0.0258677 0.03429985 0.02032709 0.03834176 0.02059245 0.02165937 0.02194166 0.0186646 0.024894 ] mean value: 0.025099301338195802 key: test_mcc value: [1. 0.82149863 0.96490128 0.93094934 0.82618439 0.96490128 0.96490128 0.89153439 1. 0.89139151] mean value: 0.925626210567648 key: train_mcc value: [0.98403035 0.98803016 0.98802882 0.99204516 0.99601537 0.99201441 0.99201441 0.99602309 0.9760922 0.99203073] mean value: 0.9896324686793974 key: test_accuracy value: [1. 0.91071429 0.98214286 0.96428571 0.91071429 0.98214286 0.98214286 0.94545455 1. 0.94545455] mean value: 0.9623051948051948 key: train_accuracy value: [0.99201597 0.99401198 0.99401198 0.99600798 0.99800399 0.99600798 0.99600798 0.99800797 0.98804781 0.99601594] mean value: 0.9948139577418867 key: test_fscore value: [1. 0.90566038 0.98245614 0.96551724 0.91525424 0.98181818 0.98245614 0.94545455 1. 0.94339623] mean value: 0.9622013090415512 key: train_fscore value: [0.99193548 0.99393939 0.99391481 0.99593496 0.9979798 0.99595142 0.99595142 0.9979798 0.98790323 0.99596774] mean value: 0.9947458042171815 key: test_precision value: [1. 0.92307692 0.96551724 0.93333333 0.87096774 1. 0.96551724 0.92857143 1. 0.96153846] mean value: 0.9548522371214251 key: train_precision value: [0.99193548 0.99595142 0.99593496 1. 0.99596774 0.99595142 0.99595142 1. 0.98790323 0.99596774] mean value: 0.9955563403910126 key: test_recall value: [1. 0.88888889 1. 1. 0.96428571 0.96428571 1. 0.96296296 1. 0.92592593] mean value: 0.9706349206349206 key: train_recall value: [0.99193548 0.99193548 0.99190283 0.99190283 1. 0.99595142 0.99595142 0.99596774 0.98790323 0.99596774] mean value: 0.9939418179443646 key: test_roc_auc value: [1. 0.90996169 0.98214286 0.96428571 0.91071429 0.98214286 0.98214286 0.9457672 1. 0.94510582] mean value: 0.9622263273125342 key: train_roc_auc value: [0.99201517 0.99399146 0.99398291 0.99595142 0.9980315 0.9960072 0.9960072 0.99798387 0.9880461 0.99601537] mean value: 0.994803220447082 key: test_jcc value: [1. 0.82758621 0.96551724 0.93333333 0.84375 0.96428571 0.96551724 0.89655172 1. 0.89285714] mean value: 0.9289398604269294 key: train_jcc value: [0.984 0.98795181 0.98790323 0.99190283 0.99596774 0.99193548 0.99193548 0.99596774 0.97609562 0.99196787] mean value: 0.9895627807672192 MCC on Blind test: 0.11 Accuracy on Blind test: 0.31 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.15868926 0.13006496 0.15043187 0.17151141 0.18619037 0.20392632 0.16918993 0.16662097 0.17729712 0.17640066] mean value: 0.16903228759765626 key: score_time value: [0.02504158 0.02538013 0.01512814 0.02922416 0.02523613 0.029881 0.02491045 0.0251205 0.02523446 0.02540874] mean value: 0.02505652904510498 key: test_mcc value: [0.78691666 0.64240102 0.67900461 0.57142857 0.57142857 0.71611487 0.57735027 0.67328042 0.67328042 0.71049701] mean value: 0.6601702430541447 key: train_mcc value: [0.98415334 0.98809222 0.98809052 0.98415084 0.98405842 0.98415084 0.98415084 0.98811501 0.9841835 0.9841835 ] mean value: 0.9853329016739125 key: test_accuracy value: [0.89285714 0.82142857 0.83928571 0.78571429 0.78571429 0.85714286 0.78571429 0.83636364 0.83636364 0.85454545] mean value: 0.829512987012987 key: train_accuracy value: [0.99201597 0.99401198 0.99401198 0.99201597 0.99201597 0.99201597 0.99201597 0.9940239 0.99203187 0.99203187] mean value: 0.992619144181756 key: test_fscore value: [0.88461538 0.81481481 0.83636364 0.78571429 0.78571429 0.85185185 0.76923077 0.83636364 0.83636364 0.84615385] mean value: 0.8247186147186147 key: train_fscore value: [0.99186992 0.99391481 0.99389002 0.99183673 0.99186992 0.99183673 0.99183673 0.99391481 0.99186992 0.99186992] mean value: 0.9924709513849441 key: test_precision value: [0.92 0.81481481 0.85185185 0.78571429 0.78571429 0.88461538 0.83333333 0.82142857 0.82142857 0.88 ] mean value: 0.8398901098901099 key: train_precision value: [1. 1. 1. 1. 0.99591837 1. 1. 1. 1. 1. ] mean value: 0.9995918367346939 key: test_recall value: [0.85185185 0.81481481 0.82142857 0.78571429 0.78571429 0.82142857 0.71428571 0.85185185 0.85185185 0.81481481] mean value: 0.8113756613756613 key: train_recall value: [0.98387097 0.98790323 0.98785425 0.98380567 0.98785425 0.98380567 0.98380567 0.98790323 0.98387097 0.98387097] mean value: 0.9854544860911584 key: test_roc_auc value: [0.89144317 0.82120051 0.83928571 0.78571429 0.78571429 0.85714286 0.78571429 0.83664021 0.83664021 0.85383598] mean value: 0.829333150884875 key: train_roc_auc value: [0.99193548 0.99395161 0.99392713 0.99190283 0.99195862 0.99190283 0.99190283 0.99395161 0.99193548 0.99193548] mean value: 0.9925303926518784 key: test_jcc value: [0.79310345 0.6875 0.71875 0.64705882 0.64705882 0.74193548 0.625 0.71875 0.71875 0.73333333] mean value: 0.7031239912538987 key: train_jcc value: [0.98387097 0.98790323 0.98785425 0.98380567 0.98387097 0.98380567 0.98380567 0.98790323 0.98387097 0.98387097] mean value: 0.9850561577641374 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.77394152 0.76272011 0.75519323 0.76581335 0.75918293 0.76116228 0.7646122 0.76457858 0.75582981 0.76644087] mean value: 0.7629474878311158 key: score_time value: [0.00938392 0.00930333 0.00939107 0.00936723 0.00953603 0.00941896 0.00916553 0.00935197 0.0094955 0.00944281] mean value: 0.009385633468627929 key: test_mcc value: [1. 0.9284802 0.89342711 0.93094934 0.85933785 0.96490128 0.96490128 0.89153439 1. 0.89139151] mean value: 0.9324922966413355 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.96428571 0.94642857 0.96428571 0.92857143 0.98214286 0.98214286 0.94545455 1. 0.94545455] mean value: 0.9658766233766234 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.96296296 0.94736842 0.96551724 0.93103448 0.98181818 0.98245614 0.94545455 1. 0.94339623] mean value: 0.9660008202192224 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96296296 0.93103448 0.93333333 0.9 1. 0.96551724 0.92857143 1. 0.96153846] mean value: 0.9582957910544118 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96296296 0.96428571 1. 0.96428571 0.96428571 1. 0.96296296 1. 0.92592593] mean value: 0.9744708994708995 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9642401 0.94642857 0.96428571 0.92857143 0.98214286 0.98214286 0.9457672 1. 0.94510582] mean value: 0.9658684546615581 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.92857143 0.9 0.93333333 0.87096774 0.96428571 0.96551724 0.89655172 1. 0.89285714] mean value: 0.9352084326500345 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.29 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.05088902 0.03218055 0.03262019 0.03219128 0.03217649 0.03136945 0.03240466 0.03182459 0.03181601 0.04167891] mean value: 0.03491511344909668 key: score_time value: [0.01279211 0.01659989 0.01733136 0.0157187 0.01501036 0.01518536 0.01507902 0.01516318 0.0155859 0.01786971] mean value: 0.015633559226989745 key: test_mcc value: [-0.06429107 0.14858083 0.24743583 0.11547005 -0.13483997 0.26997462 0.43759497 0.2377336 0.27468517 0.26587302] mean value: 0.1798217053940527 key: train_mcc value: [0.34767983 0.45701651 0.45832971 0.36717254 0.37059129 0.44575314 0.44259167 0.57806698 0.51733606 0.33589566] mean value: 0.4320433400080996 key: test_accuracy value: [0.46428571 0.53571429 0.60714286 0.53571429 0.48214286 0.58928571 0.66071429 0.58181818 0.61818182 0.58181818] mean value: 0.5656818181818182 key: train_accuracy value: [0.60479042 0.67065868 0.67065868 0.61477046 0.61676647 0.66267465 0.66067864 0.74900398 0.70916335 0.59760956] mean value: 0.6556774896422295 key: test_fscore value: [0.59459459 0.65789474 0.68571429 0.66666667 0.65060241 0.7012987 0.74666667 0.68493151 0.68656716 0.69333333] mean value: 0.6768270065783327 key: train_fscore value: [0.71469741 0.75037821 0.74962064 0.71906841 0.72011662 0.74509804 0.7439759 0.79742765 0.77258567 0.71060172] mean value: 0.7423570274505628 key: test_precision value: [0.46808511 0.51020408 0.57142857 0.52 0.49090909 0.55102041 0.59574468 0.54347826 0.575 0.54166667] mean value: 0.5367536866903855 key: train_precision value: [0.55605381 0.60048426 0.59951456 0.56136364 0.56264237 0.59375 0.59232614 0.6631016 0.62944162 0.55111111] mean value: 0.5909789120494734 key: test_recall value: [0.81481481 0.92592593 0.85714286 0.92857143 0.96428571 0.96428571 1. 0.92592593 0.85185185 0.96296296] mean value: 0.9195767195767196 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.47637292 0.54916986 0.60714286 0.53571429 0.48214286 0.58928571 0.66071429 0.58796296 0.6223545 0.58862434] mean value: 0.5699484583105273 key: train_roc_auc value: [0.60869565 0.67391304 0.67519685 0.62007874 0.62204724 0.66732283 0.66535433 0.7519685 0.71259843 0.6023622 ] mean value: 0.6599537829510441 key: test_jcc value: [0.42307692 0.49019608 0.52173913 0.5 0.48214286 0.54 0.59574468 0.52083333 0.52272727 0.53061224] mean value: 0.5127072520895565 key: train_jcc value: [0.55605381 0.60048426 0.59951456 0.56136364 0.56264237 0.59375 0.59232614 0.6631016 0.62944162 0.55111111] mean value: 0.5909789120494734 MCC on Blind test: -0.06 Accuracy on Blind test: 0.18 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03044271 0.01693082 0.0437572 0.02841759 0.04030895 0.03944254 0.0393815 0.03932357 0.03971505 0.03838658] mean value: 0.03561065196990967 key: score_time value: [0.02965569 0.01676869 0.03821087 0.01900721 0.0193789 0.01906633 0.02097154 0.01901102 0.01889944 0.01913977] mean value: 0.022010946273803712 key: test_mcc value: [0.96490128 0.85696041 0.89342711 0.71428571 0.67900461 0.85933785 0.71611487 0.71049701 0.79069197 0.8565805 ] mean value: 0.8041801324203861 key: train_mcc value: [0.86064046 0.86046475 0.86832207 0.84841579 0.88050876 0.86087775 0.85640062 0.86465808 0.86465808 0.86501334] mean value: 0.8629959687869436 key: test_accuracy value: [0.98214286 0.92857143 0.94642857 0.85714286 0.83928571 0.92857143 0.85714286 0.85454545 0.89090909 0.92727273] mean value: 0.9012012987012987 key: train_accuracy value: [0.93013972 0.93013972 0.93413174 0.9241517 0.94011976 0.93013972 0.92814371 0.93227092 0.93227092 0.93227092] mean value: 0.9313778816868256 key: test_fscore value: [0.98181818 0.92592593 0.94736842 0.85714286 0.84210526 0.92592593 0.86206897 0.84615385 0.89655172 0.92307692] mean value: 0.9008138033909359 key: train_fscore value: [0.9304175 0.93013972 0.93360161 0.92369478 0.94 0.9304175 0.92771084 0.932 0.932 0.93253968] mean value: 0.9312521625306114 key: test_precision value: [0.96428571 0.92592593 0.93103448 0.85714286 0.82758621 0.96153846 0.83333333 0.88 0.83870968 0.96 ] mean value: 0.8979556659300819 key: train_precision value: [0.91764706 0.92094862 0.928 0.91633466 0.92885375 0.9140625 0.92031873 0.92460317 0.92460317 0.91796875] mean value: 0.9213340416025564 key: test_recall value: [1. 0.92592593 0.96428571 0.85714286 0.85714286 0.89285714 0.89285714 0.81481481 0.96296296 0.88888889] mean value: 0.9056878306878307 key: train_recall value: [0.94354839 0.93951613 0.93927126 0.93117409 0.951417 0.94736842 0.93522267 0.93951613 0.93951613 0.94758065] mean value: 0.9414130860650386 key: test_roc_auc value: [0.98275862 0.9284802 0.94642857 0.85714286 0.83928571 0.92857143 0.85714286 0.85383598 0.89219577 0.9265873 ] mean value: 0.9012429301222404 key: train_roc_auc value: [0.93027222 0.93023237 0.93420256 0.92424846 0.94027543 0.93037712 0.92824126 0.93235649 0.93235649 0.93245174] mean value: 0.9315014140293759 key: test_jcc value: [0.96428571 0.86206897 0.9 0.75 0.72727273 0.86206897 0.75757576 0.73333333 0.8125 0.85714286] mean value: 0.8226248320644872 key: train_jcc value: [0.86988848 0.86940299 0.8754717 0.85820896 0.88679245 0.86988848 0.86516854 0.87265918 0.87265918 0.87360595] mean value: 0.8713745882255924 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.2774384 0.29088688 0.29256916 0.34308195 0.30772591 0.31144476 0.29420948 0.30333471 0.28965688 0.29762864] mean value: 0.3007976770401001 key: score_time value: [0.02038169 0.01897693 0.01917744 0.02136087 0.01924849 0.01914406 0.01901841 0.01909137 0.0192101 0.01927757] mean value: 0.01948869228363037 key: test_mcc value: [0.96490128 0.85696041 0.89342711 0.71428571 0.67900461 0.85933785 0.71611487 0.71049701 0.79069197 0.8565805 ] mean value: 0.8041801324203861 key: train_mcc value: [0.86064046 0.86046475 0.86832207 0.84841579 0.88050876 0.86087775 0.85640062 0.86465808 0.86465808 0.89643787] mean value: 0.8661384215049459 key: test_accuracy value: [0.98214286 0.92857143 0.94642857 0.85714286 0.83928571 0.92857143 0.85714286 0.85454545 0.89090909 0.92727273] mean value: 0.9012012987012987 key: train_accuracy value: [0.93013972 0.93013972 0.93413174 0.9241517 0.94011976 0.93013972 0.92814371 0.93227092 0.93227092 0.94820717] mean value: 0.9329715071848336 key: test_fscore value: [0.98181818 0.92592593 0.94736842 0.85714286 0.84210526 0.92592593 0.86206897 0.84615385 0.89655172 0.92307692] mean value: 0.9008138033909359 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:114: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:117: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.9304175 0.93013972 0.93360161 0.92369478 0.94 0.9304175 0.92771084 0.932 0.932 0.94779116] mean value: 0.9327773107425066 key: test_precision value: [0.96428571 0.92592593 0.93103448 0.85714286 0.82758621 0.96153846 0.83333333 0.88 0.83870968 0.96 ] mean value: 0.8979556659300819 key: train_precision value: [0.91764706 0.92094862 0.928 0.91633466 0.92885375 0.9140625 0.92031873 0.92460317 0.92460317 0.944 ] mean value: 0.9239371666025564 key: test_recall value: [1. 0.92592593 0.96428571 0.85714286 0.85714286 0.89285714 0.89285714 0.81481481 0.96296296 0.88888889] mean value: 0.9056878306878307 key: train_recall value: [0.94354839 0.93951613 0.93927126 0.93117409 0.951417 0.94736842 0.93522267 0.93951613 0.93951613 0.9516129 ] mean value: 0.9418163118714902 key: test_roc_auc value: [0.98275862 0.9284802 0.94642857 0.85714286 0.83928571 0.92857143 0.85714286 0.85383598 0.89219577 0.9265873 ] mean value: 0.9012429301222404 key: train_roc_auc value: [0.93027222 0.93023237 0.93420256 0.92424846 0.94027543 0.93037712 0.92824126 0.93235649 0.93235649 0.9482474 ] mean value: 0.9330809796885072 key: test_jcc value: [0.96428571 0.86206897 0.9 0.75 0.72727273 0.86206897 0.75757576 0.73333333 0.8125 0.85714286] mean value: 0.8226248320644872 key: train_jcc value: [0.86988848 0.86940299 0.8754717 0.85820896 0.88679245 0.86988848 0.86516854 0.87265918 0.87265918 0.90076336] mean value: 0.874090329307916 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0369215 0.03516078 0.03686929 0.04687595 0.03705096 0.06197238 0.03686905 0.03668141 0.03720927 0.03774619] mean value: 0.040335679054260255 key: score_time value: [0.01524997 0.01199031 0.01500368 0.01204038 0.01474452 0.01514792 0.0146656 0.01476455 0.01470566 0.014925 ] mean value: 0.014323759078979491 key: test_mcc value: [0.92980296 0.86189955 0.79161589 0.79110556 0.71611487 0.8660254 0.68250015 0.78571429 0.75047877 0.85933785] mean value: 0.8034595290181991 key: train_mcc value: [0.8383186 0.86982976 0.86199599 0.87376677 0.87412415 0.84662074 0.8543903 0.86221141 0.87444958 0.86237183] mean value: 0.8618079125330999 key: test_accuracy value: [0.96491228 0.92982456 0.89473684 0.89473684 0.85714286 0.92857143 0.83928571 0.89285714 0.875 0.92857143] mean value: 0.9005639097744361 key: train_accuracy value: [0.91913215 0.93491124 0.93096647 0.93688363 0.93700787 0.92322835 0.92716535 0.93110236 0.93700787 0.93110236] mean value: 0.9308507664352607 key: test_fscore value: [0.96428571 0.93103448 0.89285714 0.9 0.86206897 0.92307692 0.84745763 0.89285714 0.87719298 0.92592593] mean value: 0.9016756906853496 key: train_fscore value: [0.91976517 0.93491124 0.93123772 0.93675889 0.9375 0.92397661 0.92759295 0.93096647 0.9379845 0.93177388] mean value: 0.9312467431117992 key: test_precision value: [0.96428571 0.9 0.92592593 0.87096774 0.83333333 1. 0.80645161 0.89285714 0.86206897 0.96153846] mean value: 0.9017428898296529 key: train_precision value: [0.91439689 0.93675889 0.92578125 0.93675889 0.93023256 0.91505792 0.92217899 0.93280632 0.92366412 0.92277992] mean value: 0.9260415754273096 key: test_recall value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.85714286 0.89285714 0.89285714 0.89285714 0.89285714] mean value: 0.9043103448275862 key: train_recall value: [0.92519685 0.93307087 0.93675889 0.93675889 0.94488189 0.93307087 0.93307087 0.92913386 0.95275591 0.94094488] mean value: 0.9365643770813233 key: test_roc_auc value: [0.96490148 0.93041872 0.8953202 0.89408867 0.85714286 0.92857143 0.83928571 0.89285714 0.875 0.92857143] mean value: 0.9006157635467981 key: train_roc_auc value: [0.91912016 0.93491488 0.93097787 0.93688338 0.93700787 0.92322835 0.92716535 0.93110236 0.93700787 0.93110236] mean value: 0.9308510472752172 key: test_jcc value: [0.93103448 0.87096774 0.80645161 0.81818182 0.75757576 0.85714286 0.73529412 0.80645161 0.78125 0.86206897] mean value: 0.8226418966565289 key: train_jcc value: [0.85144928 0.87777778 0.87132353 0.88104089 0.88235294 0.85869565 0.8649635 0.87084871 0.88321168 0.87226277] mean value: 0.8713926732787018 MCC on Blind test: 0.29 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.86660838 1.03593755 0.85590649 1.07513618 0.93153334 1.0510838 1.23593092 1.05089116 1.02004242 0.89820933] mean value: 1.0021279573440551 key: score_time value: [0.01472855 0.01553202 0.01536512 0.02065921 0.01555824 0.01580501 0.01234865 0.01632595 0.0153687 0.01226258] mean value: 0.015395402908325195 key: test_mcc value: [0.92980296 0.82512315 0.85960591 0.75462449 0.71611487 0.8660254 0.64450339 0.78571429 0.75047877 0.85933785] mean value: 0.7991331087863264 key: train_mcc value: [0.89349849 0.89349683 0.94488889 0.94488889 0.90158179 0.8819171 0.8307151 0.88976378 0.90191737 0.81889764] mean value: 0.8901565879858516 key: test_accuracy value: [0.96491228 0.9122807 0.92982456 0.87719298 0.85714286 0.92857143 0.82142857 0.89285714 0.875 0.92857143] mean value: 0.8987781954887218 key: train_accuracy value: [0.94674556 0.94674556 0.97238659 0.97238659 0.9507874 0.94094488 0.91535433 0.94488189 0.9507874 0.90944882] mean value: 0.945046902421221 key: test_fscore value: [0.96428571 0.9122807 0.93103448 0.88135593 0.86206897 0.92307692 0.82758621 0.89285714 0.87719298 0.92592593] mean value: 0.8997664977732036 key: train_fscore value: [0.94674556 0.94695481 0.97211155 0.97211155 0.95088409 0.94117647 0.91552063 0.94488189 0.95145631 0.90944882] mean value: 0.9451291688116392 key: test_precision value: [0.96428571 0.89655172 0.93103448 0.86666667 0.83333333 1. 0.8 0.89285714 0.86206897 0.96153846] mean value: 0.9008336491095112 key: train_precision value: [0.9486166 0.94509804 0.97991968 0.97991968 0.94901961 0.9375 0.91372549 0.94488189 0.93869732 0.90944882] mean value: 0.9446827122144215 key: test_recall value: [0.96428571 0.92857143 0.93103448 0.89655172 0.89285714 0.85714286 0.85714286 0.89285714 0.89285714 0.89285714] mean value: 0.9006157635467981 key: train_recall value: [0.94488189 0.9488189 0.96442688 0.96442688 0.95275591 0.94488189 0.91732283 0.94488189 0.96456693 0.90944882] mean value: 0.9456412810058822 key: test_roc_auc value: [0.96490148 0.91256158 0.92980296 0.87684729 0.85714286 0.92857143 0.82142857 0.89285714 0.875 0.92857143] mean value: 0.898768472906404 key: train_roc_auc value: [0.94674925 0.94674146 0.97237092 0.97237092 0.9507874 0.94094488 0.91535433 0.94488189 0.9507874 0.90944882] mean value: 0.9450437272416047 key: test_jcc value: [0.93103448 0.83870968 0.87096774 0.78787879 0.75757576 0.85714286 0.70588235 0.80645161 0.78125 0.86206897] mean value: 0.8198962236072506 key: train_jcc value: [0.8988764 0.89925373 0.94573643 0.94573643 0.90636704 0.88888889 0.8442029 0.89552239 0.90740741 0.83393502] mean value: 0.8965926646210486 MCC on Blind test: 0.28 Accuracy on Blind test: 0.69 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01588368 0.01141286 0.01038861 0.0103476 0.01009512 0.01025605 0.01078176 0.01141858 0.0107193 0.01143622] mean value: 0.011273980140686035 key: score_time value: [0.01116276 0.00949693 0.0092299 0.00901246 0.00896025 0.00897455 0.00905919 0.00940347 0.00912666 0.00908256] mean value: 0.009350872039794922 key: test_mcc value: [0.8615634 0.55091314 0.7589669 0.59358067 0.46697379 0.4472136 0.79385662 0.50128041 0.54446551 0.72168784] mean value: 0.6240501864920316 key: train_mcc value: [0.65683536 0.65767923 0.64505247 0.67103176 0.64226244 0.66894588 0.66816241 0.70393683 0.66020809 0.66621692] mean value: 0.6640331375671215 key: test_accuracy value: [0.92982456 0.77192982 0.87719298 0.78947368 0.73214286 0.71428571 0.89285714 0.75 0.76785714 0.85714286] mean value: 0.8082706766917294 key: train_accuracy value: [0.82445759 0.82445759 0.81854043 0.83234714 0.80905512 0.83070866 0.83070866 0.8503937 0.82677165 0.82874016] mean value: 0.8276180714097128 key: test_fscore value: [0.92592593 0.74509804 0.87272727 0.76923077 0.71698113 0.66666667 0.88461538 0.74074074 0.74509804 0.84615385] mean value: 0.7913237816567451 key: train_fscore value: [0.81023454 0.80942184 0.80257511 0.81953291 0.77904328 0.81702128 0.81779661 0.84297521 0.81355932 0.8137045 ] mean value: 0.8125864591501547 key: test_precision value: [0.96153846 0.82608696 0.92307692 0.86956522 0.76 0.8 0.95833333 0.76923077 0.82608696 0.91666667] mean value: 0.8610585284280936 key: train_precision value: [0.88372093 0.88732394 0.87793427 0.8853211 0.92432432 0.88888889 0.8853211 0.88695652 0.88073394 0.89201878] mean value: 0.8892543807279057 key: test_recall value: [0.89285714 0.67857143 0.82758621 0.68965517 0.67857143 0.57142857 0.82142857 0.71428571 0.67857143 0.78571429] mean value: 0.7338669950738916 key: train_recall value: [0.7480315 0.74409449 0.73913043 0.76284585 0.67322835 0.75590551 0.75984252 0.80314961 0.75590551 0.7480315 ] mean value: 0.7490165260962933 key: test_roc_auc value: [0.92918719 0.7703202 0.87807882 0.79125616 0.73214286 0.71428571 0.89285714 0.75 0.76785714 0.85714286] mean value: 0.8083128078817734 key: train_roc_auc value: [0.82460863 0.82461641 0.81838412 0.83221033 0.80905512 0.83070866 0.83070866 0.8503937 0.82677165 0.82874016] mean value: 0.8276197441722947 key: test_jcc value: [0.86206897 0.59375 0.77419355 0.625 0.55882353 0.5 0.79310345 0.58823529 0.59375 0.73333333] mean value: 0.6622258119042945 key: train_jcc value: [0.68100358 0.67985612 0.6702509 0.6942446 0.6380597 0.69064748 0.69175627 0.72857143 0.68571429 0.68592058] mean value: 0.6846024947522601 MCC on Blind test: 0.33 Accuracy on Blind test: 0.75 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01078367 0.01111078 0.010741 0.01050091 0.01050091 0.01088929 0.01170611 0.01178074 0.01180744 0.01147151] mean value: 0.011129236221313477 key: score_time value: [0.00919318 0.00924015 0.00906968 0.00900912 0.00903487 0.00986242 0.00924635 0.00929976 0.00924873 0.00980163] mean value: 0.009300589561462402 key: test_mcc value: [0.8953202 0.7589669 0.71921182 0.59358067 0.60753044 0.75047877 0.67900461 0.67900461 0.53881591 0.75047877] mean value: 0.6972392691166773 key: train_mcc value: [0.75161488 0.72796178 0.73570284 0.72807243 0.74414639 0.74805469 0.75599926 0.75197433 0.74812427 0.76055607] mean value: 0.7452206943433384 key: test_accuracy value: [0.94736842 0.87719298 0.85964912 0.78947368 0.80357143 0.875 0.83928571 0.83928571 0.76785714 0.875 ] mean value: 0.8473684210526315 key: train_accuracy value: [0.87573964 0.86390533 0.8678501 0.86390533 0.87204724 0.87401575 0.87795276 0.87598425 0.87401575 0.87992126] mean value: 0.8725337402351333 key: test_fscore value: [0.94736842 0.88135593 0.86206897 0.76923077 0.8 0.87719298 0.84210526 0.83636364 0.77966102 0.87719298] mean value: 0.8472539969386996 key: train_fscore value: [0.87719298 0.86282306 0.86732673 0.86172345 0.87128713 0.8745098 0.87698413 0.8762279 0.87301587 0.88246628] mean value: 0.8723557335436966 key: test_precision value: [0.93103448 0.83870968 0.86206897 0.86956522 0.81481481 0.86206897 0.82758621 0.85185185 0.74193548 0.86206897] mean value: 0.846170463155519 key: train_precision value: [0.86872587 0.87148594 0.86904762 0.87398374 0.87649402 0.87109375 0.884 0.8745098 0.88 0.86415094] mean value: 0.8733491692608164 key: test_recall value: [0.96428571 0.92857143 0.86206897 0.68965517 0.78571429 0.89285714 0.85714286 0.82142857 0.82142857 0.89285714] mean value: 0.8516009852216748 key: train_recall value: [0.88582677 0.85433071 0.86561265 0.84980237 0.86614173 0.87795276 0.87007874 0.87795276 0.86614173 0.9015748 ] mean value: 0.8715415019762845 key: test_roc_auc value: [0.9476601 0.87807882 0.85960591 0.79125616 0.80357143 0.875 0.83928571 0.83928571 0.76785714 0.875 ] mean value: 0.8476600985221675 key: train_roc_auc value: [0.87571971 0.86392425 0.86784569 0.86387756 0.87204724 0.87401575 0.87795276 0.87598425 0.87401575 0.87992126] mean value: 0.8725304223335719 key: test_jcc value: [0.9 0.78787879 0.75757576 0.625 0.66666667 0.78125 0.72727273 0.71875 0.63888889 0.78125 ] mean value: 0.7384532828282828 key: train_jcc value: [0.78125 0.75874126 0.76573427 0.75704225 0.77192982 0.77700348 0.78091873 0.77972028 0.77464789 0.78965517] mean value: 0.7736643154251823 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00963664 0.01039338 0.01091814 0.00967956 0.00986576 0.00981426 0.01098919 0.01109385 0.011096 0.01093125] mean value: 0.010441803932189941 key: score_time value: [0.01654792 0.01324368 0.0126996 0.01283431 0.01158905 0.01262879 0.01270723 0.01319766 0.01274705 0.0123148 ] mean value: 0.013051009178161621 key: test_mcc value: [0.69397486 0.40447771 0.37345948 0.54592083 0.42966892 0.61065803 0.5728919 0.47187011 0.57142857 0.71611487] mean value: 0.5390465286842666 key: train_mcc value: [0.68850776 0.72786691 0.66880967 0.70027393 0.7170914 0.69728825 0.71286919 0.68506061 0.69728825 0.68988054] mean value: 0.6984936522715344 key: test_accuracy value: [0.84210526 0.70175439 0.68421053 0.77192982 0.71428571 0.80357143 0.78571429 0.73214286 0.78571429 0.85714286] mean value: 0.7678571428571428 key: train_accuracy value: [0.84418146 0.86390533 0.83431953 0.85009862 0.85826772 0.8484252 0.85629921 0.84251969 0.8484252 0.84448819] mean value: 0.8490930127816864 key: test_fscore value: [0.82352941 0.67924528 0.66666667 0.78688525 0.7037037 0.79245283 0.79310345 0.70588235 0.78571429 0.85185185] mean value: 0.7589035080027439 key: train_fscore value: [0.84294235 0.86336634 0.832 0.84860558 0.85542169 0.84569138 0.85429142 0.84189723 0.84569138 0.84040404] mean value: 0.84703114032967 key: test_precision value: [0.91304348 0.72 0.72 0.75 0.73076923 0.84 0.76666667 0.7826087 0.78571429 0.88461538] mean value: 0.7893417741678611 key: train_precision value: [0.85140562 0.8685259 0.84210526 0.85542169 0.87295082 0.86122449 0.86639676 0.8452381 0.86122449 0.86307054] mean value: 0.8587563663863939 key: test_recall value: [0.75 0.64285714 0.62068966 0.82758621 0.67857143 0.75 0.82142857 0.64285714 0.78571429 0.82142857] mean value: 0.7341133004926108 key: train_recall value: [0.83464567 0.85826772 0.82213439 0.84189723 0.83858268 0.83070866 0.84251969 0.83858268 0.83070866 0.81889764] mean value: 0.8356945006380131 key: test_roc_auc value: [0.84051724 0.70073892 0.68534483 0.77093596 0.71428571 0.80357143 0.78571429 0.73214286 0.78571429 0.85714286] mean value: 0.7676108374384236 key: train_roc_auc value: [0.84420031 0.86391647 0.83429554 0.85008247 0.85826772 0.8484252 0.85629921 0.84251969 0.8484252 0.84448819] mean value: 0.8490919983816252 key: test_jcc value: [0.7 0.51428571 0.5 0.64864865 0.54285714 0.65625 0.65714286 0.54545455 0.64705882 0.74193548] mean value: 0.6153633215789288 key: train_jcc value: [0.72852234 0.75958188 0.71232877 0.73702422 0.74736842 0.73263889 0.7456446 0.72696246 0.73263889 0.72473868] mean value: 0.7347449138309052 MCC on Blind test: 0.25 Accuracy on Blind test: 0.68 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.0228858 0.02311659 0.02236342 0.02557158 0.02441049 0.02272844 0.02228618 0.02230501 0.02276373 0.02649665] mean value: 0.02349278926849365 key: score_time value: [0.01345658 0.01215935 0.01206708 0.01320601 0.01286006 0.01210189 0.0122695 0.0118916 0.01202822 0.01279545] mean value: 0.012483572959899903 key: test_mcc value: [0.8953202 0.8953202 0.85960591 0.75462449 0.75047877 0.78772636 0.67900461 0.71611487 0.67900461 0.85933785] mean value: 0.7876537869681253 key: train_mcc value: [0.78304104 0.79093074 0.79093399 0.80276057 0.80317451 0.79926835 0.80709287 0.80714291 0.81104876 0.79139378] mean value: 0.7986787522112317 key: test_accuracy value: [0.94736842 0.94736842 0.92982456 0.87719298 0.875 0.89285714 0.83928571 0.85714286 0.83928571 0.92857143] mean value: 0.8933897243107769 key: train_accuracy value: [0.89151874 0.89546351 0.89546351 0.90138067 0.9015748 0.8996063 0.90354331 0.90354331 0.90551181 0.89566929] mean value: 0.8993275248877913 key: test_fscore value: [0.94736842 0.94736842 0.93103448 0.88135593 0.87719298 0.88888889 0.84210526 0.85185185 0.84210526 0.92592593] mean value: 0.893519743250587 key: train_fscore value: [0.89194499 0.89587426 0.89546351 0.90118577 0.90196078 0.90019569 0.90373281 0.90410959 0.90588235 0.8962818 ] mean value: 0.8996631565871114 key: test_precision value: [0.93103448 0.93103448 0.93103448 0.86666667 0.86206897 0.92307692 0.82758621 0.88461538 0.82758621 0.96153846] mean value: 0.8946242263483642 key: train_precision value: [0.89019608 0.89411765 0.89370079 0.90118577 0.8984375 0.89494163 0.90196078 0.89883268 0.90234375 0.89105058] mean value: 0.896676722068022 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286 0.85714286 0.82142857 0.85714286 0.89285714] mean value: 0.8934729064039408 key: train_recall value: [0.89370079 0.8976378 0.8972332 0.90118577 0.90551181 0.90551181 0.90551181 0.90944882 0.90944882 0.9015748 ] mean value: 0.9026765429024929 key: test_roc_auc value: [0.9476601 0.9476601 0.92980296 0.87684729 0.875 0.89285714 0.83928571 0.85714286 0.83928571 0.92857143] mean value: 0.8934113300492612 key: train_roc_auc value: [0.89151443 0.89545921 0.89546699 0.90138029 0.9015748 0.8996063 0.90354331 0.90354331 0.90551181 0.89566929] mean value: 0.8993269739503906 key: test_jcc value: [0.9 0.9 0.87096774 0.78787879 0.78125 0.8 0.72727273 0.74193548 0.72727273 0.86206897] mean value: 0.8098646433747936 key: train_jcc value: [0.80496454 0.8113879 0.81071429 0.82014388 0.82142857 0.81850534 0.82437276 0.825 0.82795699 0.81205674] mean value: 0.8176531006168795 MCC on Blind test: 0.23 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.75838876 1.88278365 1.92518592 1.9501245 1.97273588 2.06173635 2.07129908 1.80369496 1.98173571 1.86790895] mean value: 1.9275593757629395 key: score_time value: [0.01259828 0.02442145 0.01467299 0.01875997 0.01800513 0.01487517 0.01815009 0.01271486 0.0190661 0.0148983 ] mean value: 0.016816234588623045 key: test_mcc value: [0.82512315 0.68434084 0.82490815 0.7257422 0.71611487 0.85933785 0.71611487 0.75047877 0.71611487 0.78571429] mean value: 0.7603989877183843 key: train_mcc value: [0.98823511 0.99214142 0.99211042 0.99211042 0.98038334 0.98428248 0.98428248 0.98050495 0.99607071 0.99215674] mean value: 0.9882278095502359 key: test_accuracy value: [0.9122807 0.84210526 0.9122807 0.85964912 0.85714286 0.92857143 0.85714286 0.875 0.85714286 0.89285714] mean value: 0.8794172932330827 key: train_accuracy value: [0.99408284 0.99605523 0.99605523 0.99605523 0.99015748 0.99212598 0.99212598 0.99015748 0.9980315 0.99606299] mean value: 0.9940909938032894 key: test_fscore value: [0.9122807 0.83636364 0.91525424 0.87096774 0.86206897 0.92592593 0.86206897 0.87272727 0.86206897 0.89285714] mean value: 0.8812583555403708 key: train_fscore value: [0.99405941 0.99604743 0.99604743 0.99604743 0.99009901 0.99209486 0.99209486 0.99005964 0.99802761 0.99607843] mean value: 0.9940656118583756 key: test_precision value: [0.89655172 0.85185185 0.9 0.81818182 0.83333333 0.96153846 0.83333333 0.88888889 0.83333333 0.89285714] mean value: 0.8709869887456094 key: train_precision value: [1. 1. 0.99604743 0.99604743 0.99601594 0.99603175 0.99603175 1. 1. 0.9921875 ] mean value: 0.9972361789978551 key: test_recall value: [0.92857143 0.82142857 0.93103448 0.93103448 0.89285714 0.89285714 0.89285714 0.85714286 0.89285714 0.89285714] mean value: 0.8933497536945813 key: train_recall value: [0.98818898 0.99212598 0.99604743 0.99604743 0.98425197 0.98818898 0.98818898 0.98031496 0.99606299 1. ] mean value: 0.9909417696305749 key: test_roc_auc value: [0.91256158 0.84174877 0.91194581 0.85837438 0.85714286 0.92857143 0.85714286 0.875 0.85714286 0.89285714] mean value: 0.8792487684729065 key: train_roc_auc value: [0.99409449 0.99606299 0.99605521 0.99605521 0.99015748 0.99212598 0.99212598 0.99015748 0.9980315 0.99606299] mean value: 0.9940929320593819 key: test_jcc value: [0.83870968 0.71875 0.84375 0.77142857 0.75757576 0.86206897 0.75757576 0.77419355 0.75757576 0.80645161] mean value: 0.7888079648382763 key: train_jcc value: [0.98818898 0.99212598 0.99212598 0.99212598 0.98039216 0.98431373 0.98431373 0.98031496 0.99606299 0.9921875 ] mean value: 0.9882151989732901 MCC on Blind test: 0.23 Accuracy on Blind test: 0.66 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03890681 0.02280307 0.02086091 0.0209384 0.01961708 0.02346158 0.02328086 0.02160168 0.02298188 0.0219512 ] mean value: 0.02364034652709961 key: score_time value: [0.01179671 0.00914145 0.0088346 0.00926948 0.00878859 0.00875878 0.00874162 0.00876069 0.00872087 0.00878668] mean value: 0.00915994644165039 key: test_mcc value: [0.96547546 0.82512315 0.8953202 0.92980296 0.85714286 0.92857143 0.89342711 0.85933785 0.93094934 0.85933785] mean value: 0.894448819396392 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.9122807 0.94736842 0.96491228 0.92857143 0.96428571 0.94642857 0.92857143 0.96428571 0.92857143] mean value: 0.9467731829573934 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.9122807 0.94736842 0.96551724 0.92857143 0.96428571 0.94736842 0.93103448 0.96296296 0.93103448] mean value: 0.9472242038394488 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.89655172 0.96428571 0.96551724 0.92857143 0.96428571 0.93103448 0.9 1. 0.9 ] mean value: 0.9450246305418719 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.92857143 0.93103448 0.96551724 0.92857143 0.96428571 0.96428571 0.96428571 0.92857143 0.96428571] mean value: 0.9503694581280788 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.91256158 0.9476601 0.96490148 0.92857143 0.96428571 0.94642857 0.92857143 0.96428571 0.92857143] mean value: 0.9467980295566503 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.83870968 0.9 0.93333333 0.86666667 0.93103448 0.9 0.87096774 0.92857143 0.87096774] mean value: 0.9004536786906087 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.47 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12795711 0.12534237 0.12668014 0.12615395 0.12666821 0.12553144 0.12423468 0.12471223 0.12423205 0.12511349] mean value: 0.12566256523132324 key: score_time value: [0.01779485 0.01778674 0.01880646 0.01766038 0.01766086 0.01774502 0.0188868 0.01795578 0.01807523 0.01787972] mean value: 0.01802518367767334 key: test_mcc value: [0.92980296 0.64901478 0.82512315 0.75462449 0.78772636 0.85933785 0.85933785 0.71428571 0.71611487 0.82195294] mean value: 0.791732097228972 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.8245614 0.9122807 0.87719298 0.89285714 0.92857143 0.92857143 0.85714286 0.85714286 0.91071429] mean value: 0.8953947368421052 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.82142857 0.9122807 0.88135593 0.89655172 0.92592593 0.93103448 0.85714286 0.86206897 0.90909091] mean value: 0.8961165784245547 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96428571 0.82142857 0.92857143 0.86666667 0.86666667 0.96153846 0.9 0.85714286 0.83333333 0.92592593] mean value: 0.8925559625559626 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.82142857 0.89655172 0.89655172 0.92857143 0.89285714 0.96428571 0.85714286 0.89285714 0.89285714] mean value: 0.9007389162561577 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96490148 0.82450739 0.91256158 0.87684729 0.89285714 0.92857143 0.92857143 0.85714286 0.85714286 0.91071429] mean value: 0.8953817733990148 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.6969697 0.83870968 0.78787879 0.8125 0.86206897 0.87096774 0.75 0.75757576 0.83333333] mean value: 0.8141038443388277 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.33 Accuracy on Blind test: 0.72 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0101614 0.01020575 0.01025558 0.01031733 0.0105567 0.01035643 0.01047182 0.01056862 0.01091957 0.01059294] mean value: 0.010440611839294433 key: score_time value: [0.00886583 0.00881696 0.00859666 0.00871801 0.00875425 0.0088892 0.00892472 0.00889111 0.00875926 0.00875688] mean value: 0.008797287940979004 key: test_mcc value: [0.68736396 0.47413793 0.57973205 0.62473685 0.28644595 0.53605627 0.32163376 0.36084392 0.58501794 0.47187011] mean value: 0.49278387214833563 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84210526 0.73684211 0.78947368 0.80701754 0.64285714 0.76785714 0.66071429 0.67857143 0.78571429 0.73214286] mean value: 0.7443295739348371 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83018868 0.73684211 0.8 0.79245283 0.65517241 0.77192982 0.65454545 0.7 0.80645161 0.70588235] mean value: 0.7453465273441484 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88 0.72413793 0.77419355 0.875 0.63333333 0.75862069 0.66666667 0.65625 0.73529412 0.7826087 ] mean value: 0.7486104982375985 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.78571429 0.75 0.82758621 0.72413793 0.67857143 0.78571429 0.64285714 0.75 0.89285714 0.64285714] mean value: 0.7480295566502463 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.841133 0.73706897 0.7887931 0.80849754 0.64285714 0.76785714 0.66071429 0.67857143 0.78571429 0.73214286] mean value: 0.7443349753694581 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.70967742 0.58333333 0.66666667 0.65625 0.48717949 0.62857143 0.48648649 0.53846154 0.67567568 0.54545455] mean value: 0.5977756581184 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.61 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.86948252 1.84095502 1.86450291 1.85053563 1.83855367 1.95213008 1.8649447 1.85131979 1.8796494 1.86519742] mean value: 1.8677271127700805 key: score_time value: [0.093261 0.09254313 0.0949862 0.09473634 0.09973621 0.10174298 0.09618306 0.09803486 0.09973669 0.09787321] mean value: 0.09688336849212646 key: test_mcc value: [0.96547546 0.8953202 0.92980296 0.85960591 0.85714286 1. 0.96490128 0.89342711 0.93094934 0.89342711] mean value: 0.9190052220037386 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.94736842 0.96491228 0.92982456 0.92857143 1. 0.98214286 0.94642857 0.96428571 0.94642857] mean value: 0.9592418546365915 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.94736842 0.96551724 0.93103448 0.92857143 1. 0.98245614 0.94736842 0.96296296 0.94736842] mean value: 0.9594465700999276 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93103448 0.96551724 0.93103448 0.92857143 1. 0.96551724 0.93103448 1. 0.93103448] mean value: 0.9583743842364532 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.96551724 0.93103448 0.92857143 1. 1. 0.96428571 0.92857143 0.96428571] mean value: 0.9610837438423645 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.9476601 0.96490148 0.92980296 0.92857143 1. 0.98214286 0.94642857 0.96428571 0.94642857] mean value: 0.9592364532019705 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.9 0.93333333 0.87096774 0.86666667 1. 0.96551724 0.9 0.92857143 0.9 ] mean value: 0.9229342126171938 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.6 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.00257015 0.97476339 0.95755458 1.02234483 0.98715186 0.97197175 0.99527335 1.01324677 0.94979787 1.04136539] mean value: 0.9916039943695069 key: score_time value: [0.21403074 0.20837617 0.2156167 0.25913739 0.2587049 0.23948455 0.25020814 0.22535872 0.18812346 0.26523614] mean value: 0.23242769241333008 key: test_mcc value: [0.96547546 0.8953202 0.8953202 0.85960591 0.85714286 0.93094934 0.92857143 0.82195294 0.93094934 0.89342711] mean value: 0.8978714775896186 key: train_mcc value: [0.95266254 0.96450413 0.94872473 0.96055211 0.9606597 0.94882625 0.94882625 0.95670033 0.94882625 0.95675965] mean value: 0.9547041949463222 key: test_accuracy value: [0.98245614 0.94736842 0.94736842 0.92982456 0.92857143 0.96428571 0.96428571 0.91071429 0.96428571 0.94642857] mean value: 0.9485588972431077 key: train_accuracy value: [0.97633136 0.98224852 0.97435897 0.98027613 0.98031496 0.97440945 0.97440945 0.97834646 0.97440945 0.97834646] mean value: 0.9773451210610509 key: test_fscore value: [0.98181818 0.94736842 0.94736842 0.93103448 0.92857143 0.96551724 0.96428571 0.90909091 0.96296296 0.94736842] mean value: 0.9485386184025023 key: train_fscore value: [0.97637795 0.98231827 0.97425743 0.98023715 0.98039216 0.97445972 0.97445972 0.97830375 0.97445972 0.97847358] mean value: 0.9773739464231741 key: test_precision value: [1. 0.93103448 0.96428571 0.93103448 0.92857143 0.93333333 0.96428571 0.92592593 1. 0.93103448] mean value: 0.9509505564677978 key: train_precision value: [0.97637795 0.98039216 0.97619048 0.98023715 0.9765625 0.97254902 0.97254902 0.98023715 0.97254902 0.97276265] mean value: 0.9760407098847448 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 1. 0.96428571 0.89285714 0.92857143 0.96428571] mean value: 0.9469211822660099 key: train_recall value: [0.97637795 0.98425197 0.97233202 0.98023715 0.98425197 0.97637795 0.97637795 0.97637795 0.97637795 0.98425197] mean value: 0.9787214839251813 key: test_roc_auc value: [0.98214286 0.9476601 0.9476601 0.92980296 0.92857143 0.96428571 0.96428571 0.91071429 0.96428571 0.94642857] mean value: 0.9485837438423645 key: train_roc_auc value: [0.97633127 0.98224456 0.97435498 0.98027606 0.98031496 0.97440945 0.97440945 0.97834646 0.97440945 0.97834646] mean value: 0.9773443092340731 key: test_jcc value: [0.96428571 0.9 0.9 0.87096774 0.86666667 0.93333333 0.93103448 0.83333333 0.92857143 0.9 ] mean value: 0.9028192700884581 key: train_jcc value: [0.95384615 0.96525097 0.94980695 0.96124031 0.96153846 0.95019157 0.95019157 0.95752896 0.95019157 0.95785441] mean value: 0.9557640916822954 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02652383 0.0104022 0.01039076 0.01055098 0.01046586 0.01057124 0.01046538 0.010535 0.01041842 0.01369309] mean value: 0.012401676177978516 key: score_time value: [0.01325512 0.00898743 0.00907898 0.00889564 0.0088706 0.00887895 0.0090363 0.00889659 0.00893497 0.00981998] mean value: 0.009465456008911133 key: test_mcc value: [0.8953202 0.7589669 0.71921182 0.59358067 0.60753044 0.75047877 0.67900461 0.67900461 0.53881591 0.75047877] mean value: 0.6972392691166773 key: train_mcc value: [0.75161488 0.72796178 0.73570284 0.72807243 0.74414639 0.74805469 0.75599926 0.75197433 0.74812427 0.76055607] mean value: 0.7452206943433384 key: test_accuracy value: [0.94736842 0.87719298 0.85964912 0.78947368 0.80357143 0.875 0.83928571 0.83928571 0.76785714 0.875 ] mean value: 0.8473684210526315 key: train_accuracy value: [0.87573964 0.86390533 0.8678501 0.86390533 0.87204724 0.87401575 0.87795276 0.87598425 0.87401575 0.87992126] mean value: 0.8725337402351333 key: test_fscore value: [0.94736842 0.88135593 0.86206897 0.76923077 0.8 0.87719298 0.84210526 0.83636364 0.77966102 0.87719298] mean value: 0.8472539969386996 key: train_fscore value: [0.87719298 0.86282306 0.86732673 0.86172345 0.87128713 0.8745098 0.87698413 0.8762279 0.87301587 0.88246628] mean value: 0.8723557335436966 key: test_precision value: [0.93103448 0.83870968 0.86206897 0.86956522 0.81481481 0.86206897 0.82758621 0.85185185 0.74193548 0.86206897] mean value: 0.846170463155519 key: train_precision value: [0.86872587 0.87148594 0.86904762 0.87398374 0.87649402 0.87109375 0.884 0.8745098 0.88 0.86415094] mean value: 0.8733491692608164 key: test_recall value: [0.96428571 0.92857143 0.86206897 0.68965517 0.78571429 0.89285714 0.85714286 0.82142857 0.82142857 0.89285714] mean value: 0.8516009852216748 key: train_recall value: [0.88582677 0.85433071 0.86561265 0.84980237 0.86614173 0.87795276 0.87007874 0.87795276 0.86614173 0.9015748 ] mean value: 0.8715415019762845 key: test_roc_auc value: [0.9476601 0.87807882 0.85960591 0.79125616 0.80357143 0.875 0.83928571 0.83928571 0.76785714 0.875 ] mean value: 0.8476600985221675 key: train_roc_auc value: [0.87571971 0.86392425 0.86784569 0.86387756 0.87204724 0.87401575 0.87795276 0.87598425 0.87401575 0.87992126] mean value: 0.8725304223335719 key: test_jcc value: [0.9 0.78787879 0.75757576 0.625 0.66666667 0.78125 0.72727273 0.71875 0.63888889 0.78125 ] mean value: 0.7384532828282828 key: train_jcc value: [0.78125 0.75874126 0.76573427 0.75704225 0.77192982 0.77700348 0.78091873 0.77972028 0.77464789 0.78965517] mean value: 0.7736643154251823 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.0821619 0.07773137 0.08024621 0.07795167 0.07078624 0.07733774 0.07203126 0.07329893 0.23815799 0.07500362] mean value: 0.09247069358825684 key: score_time value: [0.01104927 0.01100373 0.01096082 0.01094604 0.01078987 0.0109508 0.01069236 0.01092672 0.01182914 0.0112741 ] mean value: 0.011042284965515136 key: test_mcc value: [0.96547546 0.92980296 0.8953202 0.8951918 0.82195294 1. 0.96490128 0.89342711 0.93094934 0.92857143] mean value: 0.922559251374398 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.96491228 0.94736842 0.94736842 0.91071429 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.9609962406015038 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.96428571 0.94736842 0.94915254 0.9122807 1. 0.98245614 0.94736842 0.96296296 0.96428571] mean value: 0.9611978799935982 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96428571 0.96428571 0.93333333 0.89655172 1. 0.96551724 0.93103448 1. 0.96428571] mean value: 0.9619293924466339 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.96551724 0.92857143 1. 1. 0.96428571 0.92857143 0.96428571] mean value: 0.9610837438423645 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.96490148 0.9476601 0.94704433 0.91071429 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.960960591133005 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.93103448 0.9 0.90322581 0.83870968 1. 0.96551724 0.9 0.92857143 0.93103448] mean value: 0.9262378833624663 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.34 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04363132 0.07157731 0.04334879 0.08873129 0.07350183 0.06260896 0.08780813 0.07436919 0.07810593 0.07584238] mean value: 0.06995251178741455 key: score_time value: [0.01874518 0.0121007 0.01208496 0.01211381 0.01205897 0.01863194 0.01871586 0.0187521 0.01878953 0.01875067] mean value: 0.016074371337890626 key: test_mcc value: [0.82942474 0.86189955 0.75492611 0.79110556 0.75047877 0.85933785 0.67900461 0.78571429 0.75047877 0.85933785] mean value: 0.7921708093166815 key: train_mcc value: [0.89366043 0.88208839 0.88566582 0.90535473 0.91750062 0.89376313 0.89387399 0.90553988 0.90962508 0.88599845] mean value: 0.8973070516162945 key: test_accuracy value: [0.9122807 0.92982456 0.87719298 0.89473684 0.875 0.92857143 0.83928571 0.89285714 0.875 0.92857143] mean value: 0.8953320802005013 key: train_accuracy value: [0.94674556 0.9408284 0.94280079 0.95266272 0.95866142 0.94685039 0.94685039 0.95275591 0.95472441 0.94291339] mean value: 0.9485793380856978 key: test_fscore value: [0.91525424 0.93103448 0.87719298 0.9 0.87719298 0.92592593 0.84210526 0.89285714 0.87719298 0.92592593] mean value: 0.8964681925282066 key: train_fscore value: [0.94736842 0.94186047 0.94302554 0.95275591 0.95906433 0.94716243 0.94736842 0.95294118 0.95516569 0.94346979] mean value: 0.9490182161161698 key: test_precision value: [0.87096774 0.9 0.89285714 0.87096774 0.86206897 0.96153846 0.82758621 0.89285714 0.86206897 0.96153846] mean value: 0.8902450830593212 key: train_precision value: [0.93822394 0.92748092 0.9375 0.94901961 0.94980695 0.94163424 0.93822394 0.94921875 0.94594595 0.93436293] mean value: 0.9411417221682514 key: test_recall value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.89285714 0.85714286 0.89285714 0.89285714 0.89285714] mean value: 0.9043103448275862 key: train_recall value: [0.95669291 0.95669291 0.9486166 0.95652174 0.96850394 0.95275591 0.95669291 0.95669291 0.96456693 0.95275591] mean value: 0.957049267062961 key: test_roc_auc value: [0.91317734 0.93041872 0.87746305 0.89408867 0.875 0.92857143 0.83928571 0.89285714 0.875 0.92857143] mean value: 0.8954433497536947 key: train_roc_auc value: [0.9467259 0.94079705 0.94281224 0.95267032 0.95866142 0.94685039 0.94685039 0.95275591 0.95472441 0.94291339] mean value: 0.9485761414210576 key: test_jcc value: [0.84375 0.87096774 0.78125 0.81818182 0.78125 0.86206897 0.72727273 0.80645161 0.78125 0.86206897] mean value: 0.8134511831327738 key: train_jcc value: [0.9 0.89010989 0.89219331 0.90977444 0.92134831 0.89962825 0.9 0.91011236 0.9141791 0.89298893] mean value: 0.903033459606262 MCC on Blind test: 0.22 Accuracy on Blind test: 0.67 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01380754 0.01324916 0.01125026 0.01091313 0.01098466 0.01100469 0.0110321 0.01100469 0.01104236 0.01112938] mean value: 0.011541795730590821 key: score_time value: [0.01176262 0.00988102 0.00967598 0.00937033 0.00944376 0.00940394 0.00945687 0.00941181 0.00942016 0.00950742] mean value: 0.00973339080810547 key: test_mcc value: [0.8953202 0.79161589 0.85960591 0.68850906 0.5728919 0.71428571 0.71428571 0.60753044 0.57142857 0.78772636] mean value: 0.7203199757321368 key: train_mcc value: [0.75937568 0.72007098 0.71203374 0.70842038 0.74805469 0.76387425 0.7480315 0.73230617 0.7170914 0.76380321] mean value: 0.7373061978922988 key: test_accuracy value: [0.94736842 0.89473684 0.92982456 0.84210526 0.78571429 0.85714286 0.85714286 0.80357143 0.78571429 0.89285714] mean value: 0.8596177944862156 key: train_accuracy value: [0.87968442 0.85996055 0.85601578 0.85404339 0.87401575 0.88188976 0.87401575 0.86614173 0.85826772 0.88188976] mean value: 0.8685924614452779 key: test_fscore value: [0.94736842 0.89655172 0.93103448 0.83636364 0.77777778 0.85714286 0.85714286 0.8 0.78571429 0.88888889] mean value: 0.8577984930979486 key: train_fscore value: [0.87968442 0.85884692 0.85544554 0.85140562 0.87351779 0.88095238 0.87401575 0.86561265 0.85542169 0.88142292] mean value: 0.8676325679094097 key: test_precision value: [0.93103448 0.86666667 0.93103448 0.88461538 0.80769231 0.85714286 0.85714286 0.81481481 0.78571429 0.92307692] mean value: 0.8658935062383338 key: train_precision value: [0.88142292 0.86746988 0.85714286 0.86530612 0.87698413 0.888 0.87401575 0.86904762 0.87295082 0.88492063] mean value: 0.8737260732667103 key: test_recall value: [0.96428571 0.92857143 0.93103448 0.79310345 0.75 0.85714286 0.85714286 0.78571429 0.78571429 0.85714286] mean value: 0.8509852216748768 key: train_recall value: [0.87795276 0.8503937 0.85375494 0.83794466 0.87007874 0.87401575 0.87401575 0.86220472 0.83858268 0.87795276] mean value: 0.8616896455136784 key: test_roc_auc value: [0.9476601 0.8953202 0.92980296 0.8429803 0.78571429 0.85714286 0.85714286 0.80357143 0.78571429 0.89285714] mean value: 0.8597906403940887 key: train_roc_auc value: [0.87968784 0.85997946 0.85601133 0.8540117 0.87401575 0.88188976 0.87401575 0.86614173 0.85826772 0.88188976] mean value: 0.8685910802651645 key: test_jcc value: [0.9 0.8125 0.87096774 0.71875 0.63636364 0.75 0.75 0.66666667 0.64705882 0.8 ] mean value: 0.7552306868495199 key: train_jcc value: [0.78521127 0.75261324 0.74740484 0.74125874 0.7754386 0.78723404 0.77622378 0.7630662 0.74736842 0.78798587] mean value: 0.7663804997708952 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01800466 0.02170086 0.01728868 0.0168364 0.01796818 0.02323866 0.02558088 0.0231514 0.02295661 0.01844239] mean value: 0.02051687240600586 key: score_time value: [0.01028657 0.01107836 0.014539 0.01169682 0.02031279 0.01512098 0.02078176 0.01514125 0.01528645 0.01752949] mean value: 0.015177345275878907 key: test_mcc value: [0.82880708 0.70453109 0.93202124 0.71921182 0.75434227 0.85714286 0.72168784 0.82618439 0.64951905 0.89342711] mean value: 0.7886874758942892 key: train_mcc value: [0.81176962 0.86069176 0.77915876 0.81944159 0.84845848 0.86746041 0.90553988 0.81229142 0.89075842 0.85449628] mean value: 0.8450066627933005 key: test_accuracy value: [0.9122807 0.84210526 0.96491228 0.85964912 0.875 0.92857143 0.85714286 0.91071429 0.82142857 0.94642857] mean value: 0.8918233082706767 key: train_accuracy value: [0.89940828 0.92899408 0.88560158 0.90927022 0.92125984 0.93307087 0.95275591 0.9015748 0.94488189 0.92716535] mean value: 0.9203982823153023 key: test_fscore value: [0.90566038 0.81632653 0.96666667 0.86206897 0.86792453 0.92857143 0.86666667 0.91525424 0.83333333 0.94545455] mean value: 0.890792727977064 key: train_fscore value: [0.88984881 0.92622951 0.89298893 0.90688259 0.91631799 0.9348659 0.95294118 0.90842491 0.94615385 0.92787524] mean value: 0.9202528908003171 key: test_precision value: [0.96 0.95238095 0.93548387 0.86206897 0.92 0.92857143 0.8125 0.87096774 0.78125 0.96296296] mean value: 0.8986185922335811 key: train_precision value: [0.98564593 0.96581197 0.83737024 0.92946058 0.97767857 0.91044776 0.94921875 0.84931507 0.92481203 0.91891892] mean value: 0.9248679822063575 key: test_recall value: [0.85714286 0.71428571 1. 0.86206897 0.82142857 0.92857143 0.92857143 0.96428571 0.89285714 0.92857143] mean value: 0.8897783251231527 key: train_recall value: [0.81102362 0.88976378 0.95652174 0.88537549 0.86220472 0.96062992 0.95669291 0.97637795 0.96850394 0.93700787] mean value: 0.920410195761103 key: test_roc_auc value: [0.91133005 0.83990148 0.96428571 0.85960591 0.875 0.92857143 0.85714286 0.91071429 0.82142857 0.94642857] mean value: 0.8914408866995074 key: train_roc_auc value: [0.89958296 0.92907161 0.88574118 0.90922318 0.92125984 0.93307087 0.95275591 0.9015748 0.94488189 0.92716535] mean value: 0.9204327596402229 key: test_jcc value: [0.82758621 0.68965517 0.93548387 0.75757576 0.76666667 0.86666667 0.76470588 0.84375 0.71428571 0.89655172] mean value: 0.8062927661963765 key: train_jcc value: [0.80155642 0.86259542 0.80666667 0.82962963 0.84555985 0.87769784 0.91011236 0.83221477 0.89781022 0.86545455] mean value: 0.8529297712747432 MCC on Blind test: 0.27 Accuracy on Blind test: 0.65 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02081895 0.01910305 0.02194667 0.02428484 0.02053165 0.02217913 0.02111697 0.02241921 0.02124858 0.02041411] mean value: 0.02140631675720215 key: score_time value: [0.01168537 0.01173663 0.01170278 0.01167035 0.01188517 0.01176786 0.01174521 0.01173615 0.01174688 0.01172304] mean value: 0.01173994541168213 key: test_mcc value: [0.8615634 0.6746955 0.79161589 0.64889453 0.43519414 0.43759497 0.74535599 0.82195294 0.72168784 0.82195294] mean value: 0.6960508151248677 key: train_mcc value: [0.79945572 0.82446063 0.89465736 0.76270442 0.5287693 0.54398379 0.77497517 0.91002026 0.89833428 0.88188976] mean value: 0.7819250707589638 key: test_accuracy value: [0.92982456 0.8245614 0.89473684 0.80701754 0.67857143 0.66071429 0.85714286 0.91071429 0.85714286 0.91071429] mean value: 0.8331140350877193 key: train_accuracy value: [0.89151874 0.90729783 0.94674556 0.86982249 0.71850394 0.72834646 0.87795276 0.95472441 0.9488189 0.94094488] mean value: 0.8784675953967293 key: test_fscore value: [0.92592593 0.79166667 0.89285714 0.7755102 0.55 0.48648649 0.83333333 0.9122807 0.86666667 0.90909091] mean value: 0.794381803686315 key: train_fscore value: [0.87964989 0.89978678 0.94523327 0.85135135 0.60821918 0.62702703 0.86283186 0.95390782 0.94980695 0.94094488] mean value: 0.8518758998890312 key: test_precision value: [0.96153846 0.95 0.92592593 0.95 0.91666667 1. 1. 0.89655172 0.8125 0.92592593] mean value: 0.9339108704194911 key: train_precision value: [0.99014778 0.98139535 0.97083333 0.9895288 1. 1. 0.98484848 0.97142857 0.93181818 0.94094488] mean value: 0.9760945381218294 key: test_recall value: [0.89285714 0.67857143 0.86206897 0.65517241 0.39285714 0.32142857 0.71428571 0.92857143 0.92857143 0.89285714] mean value: 0.7267241379310345 key: train_recall value: [0.79133858 0.83070866 0.92094862 0.74703557 0.43700787 0.45669291 0.76771654 0.93700787 0.96850394 0.94094488] mean value: 0.779790544956584 key: test_roc_auc value: [0.92918719 0.82204433 0.8953202 0.80972906 0.67857143 0.66071429 0.85714286 0.91071429 0.85714286 0.91071429] mean value: 0.833128078817734 key: train_roc_auc value: [0.89171672 0.90744919 0.94669478 0.86958078 0.71850394 0.72834646 0.87795276 0.95472441 0.9488189 0.94094488] mean value: 0.878473281254863 key: test_jcc value: [0.86206897 0.65517241 0.80645161 0.63333333 0.37931034 0.32142857 0.71428571 0.83870968 0.76470588 0.83333333] mean value: 0.6808799849194405 key: train_jcc value: [0.78515625 0.81782946 0.89615385 0.74117647 0.43700787 0.45669291 0.75875486 0.91187739 0.90441176 0.88847584] mean value: 0.7597536671094351 MCC on Blind test: 0.35 Accuracy on Blind test: 0.84 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.21202064 0.20090246 0.19995809 0.20182872 0.20169139 0.20114875 0.20171046 0.20118713 0.20071888 0.19601607] mean value: 0.20171825885772704 key: score_time value: [0.01640511 0.01657915 0.01657701 0.01667643 0.01658249 0.01643133 0.0165379 0.01658034 0.01661682 0.01556373] mean value: 0.01645503044128418 key: test_mcc value: [0.96547546 0.85960591 0.8953202 0.96547546 0.82195294 0.96490128 0.96490128 0.89342711 1. 0.96490128] mean value: 0.929596092121727 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.92982456 0.94736842 0.98245614 0.91071429 0.98214286 0.98214286 0.94642857 1. 0.98214286] mean value: 0.9645676691729324 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.92857143 0.94736842 0.98305085 0.9122807 0.98181818 0.98245614 0.94736842 1. 0.98181818] mean value: 0.9646550505694127 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.92857143 0.96428571 0.96666667 0.89655172 1. 0.96551724 0.93103448 1. 1. ] mean value: 0.9652627257799672 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.92857143 0.93103448 1. 0.92857143 0.96428571 1. 0.96428571 1. 0.96428571] mean value: 0.9645320197044335 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.92980296 0.9476601 0.98214286 0.91071429 0.98214286 0.98214286 0.94642857 1. 0.98214286] mean value: 0.9645320197044336 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.86666667 0.9 0.96666667 0.83870968 0.96428571 0.96551724 0.9 1. 0.96428571] mean value: 0.9330417394989141 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.38 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06119132 0.09244418 0.08146381 0.08729506 0.08371305 0.07937193 0.07151246 0.07341266 0.07586217 0.08966494] mean value: 0.07959315776824952 key: score_time value: [0.02624488 0.02854729 0.03578186 0.03146505 0.0310235 0.02910662 0.03994799 0.02881432 0.02941585 0.0330739 ] mean value: 0.03134212493896484 key: test_mcc value: [0.96547546 0.8953202 0.8953202 0.93202124 0.82618439 1. 0.96490128 0.89342711 0.93094934 0.92857143] mean value: 0.9232170639899975 key: train_mcc value: [0.99214142 0.99211042 0.98823457 1. 0.99212598 0.98819663 0.98038334 1. 0.99212598 0.98425197] mean value: 0.990957032924043 key: test_accuracy value: [0.98245614 0.94736842 0.94736842 0.96491228 0.91071429 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.9609962406015038 key: train_accuracy value: [0.99605523 0.99605523 0.99408284 1. 0.99606299 0.99409449 0.99015748 1. 0.99606299 0.99212598] mean value: 0.9954697230893476 key: test_fscore value: [0.98181818 0.94736842 0.94736842 0.96666667 0.91525424 1. 0.98245614 0.94736842 0.96296296 0.96428571] mean value: 0.9615549166530434 key: train_fscore value: [0.99604743 0.99606299 0.99403579 1. 0.99606299 0.99408284 0.99009901 1. 0.99606299 0.99212598] mean value: 0.9954580026885907 key: test_precision value: [1. 0.93103448 0.96428571 0.93548387 0.87096774 1. 0.96551724 0.93103448 1. 0.96428571] mean value: 0.9562609248371206 key: train_precision value: [1. 0.99606299 1. 1. 0.99606299 0.99604743 0.99601594 1. 0.99606299 0.99212598] mean value: 0.9972378327714941 key: test_recall value: [0.96428571 0.96428571 0.93103448 1. 0.96428571 1. 1. 0.96428571 0.92857143 0.96428571] mean value: 0.968103448275862 key: train_recall value: [0.99212598 0.99606299 0.98814229 1. 0.99606299 0.99212598 0.98425197 1. 0.99606299 0.99212598] mean value: 0.9936961190127914 key: test_roc_auc value: [0.98214286 0.9476601 0.9476601 0.96428571 0.91071429 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.960960591133005 key: train_roc_auc value: [0.99606299 0.99605521 0.99407115 1. 0.99606299 0.99409449 0.99015748 1. 0.99606299 0.99212598] mean value: 0.9954693286856929 key: test_jcc value: [0.96428571 0.9 0.9 0.93548387 0.84375 1. 0.96551724 0.9 0.92857143 0.93103448] mean value: 0.9268642737962816 key: train_jcc value: [0.99212598 0.99215686 0.98814229 1. 0.99215686 0.98823529 0.98039216 1. 0.99215686 0.984375 ] mean value: 0.9909741315957773 MCC on Blind test: 0.08 Accuracy on Blind test: 0.33 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.17389727 0.18126535 0.21880174 0.18185973 0.17694855 0.22035027 0.17903447 0.17763066 0.12950563 0.20781183] mean value: 0.18471055030822753 key: score_time value: [0.02546906 0.03429747 0.0409584 0.03237605 0.0260911 0.02633905 0.02989173 0.02581334 0.0195868 0.02531815] mean value: 0.028614115715026856 key: test_mcc value: [0.85960591 0.54592083 0.65104858 0.65018988 0.4330127 0.67900461 0.53605627 0.5728919 0.64285714 0.82195294] mean value: 0.6392540773496393 key: train_mcc value: [0.98823511 0.98434388 0.98434291 0.98823457 0.98428248 0.98437404 0.98437404 0.98437404 0.98428248 0.98437404] mean value: 0.9851217587100868 key: test_accuracy value: [0.92982456 0.77192982 0.8245614 0.8245614 0.71428571 0.83928571 0.76785714 0.78571429 0.82142857 0.91071429] mean value: 0.819016290726817 key: train_accuracy value: [0.99408284 0.99211045 0.99211045 0.99408284 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598] mean value: 0.9925142493283015 key: test_fscore value: [0.92857143 0.75471698 0.82142857 0.83333333 0.69230769 0.83636364 0.76363636 0.77777778 0.82142857 0.90909091] mean value: 0.813865526507036 key: train_fscore value: [0.99405941 0.99206349 0.99203187 0.99403579 0.99209486 0.99206349 0.99206349 0.99206349 0.99209486 0.99206349] mean value: 0.9924634247376443 key: test_precision value: [0.92857143 0.8 0.85185185 0.80645161 0.75 0.85185185 0.77777778 0.80769231 0.82142857 0.92592593] mean value: 0.8321551328002941 key: train_precision value: [1. 1. 1. 1. 0.99603175 1. 1. 1. 0.99603175 1. ] mean value: 0.9992063492063492 key: test_recall value: [0.92857143 0.71428571 0.79310345 0.86206897 0.64285714 0.82142857 0.75 0.75 0.82142857 0.89285714] mean value: 0.7976600985221675 key: train_recall value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197 0.98425197 0.98425197 0.98818898 0.98425197] mean value: 0.985815878746382 key: test_roc_auc value: [0.92980296 0.77093596 0.82512315 0.82389163 0.71428571 0.83928571 0.76785714 0.78571429 0.82142857 0.91071429] mean value: 0.8189039408866995 key: train_roc_auc value: [0.99409449 0.99212598 0.99209486 0.99407115 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598] mean value: 0.9925142385857895 key: test_jcc value: [0.86666667 0.60606061 0.6969697 0.71428571 0.52941176 0.71875 0.61764706 0.63636364 0.6969697 0.83333333] mean value: 0.6916458174178762 key: train_jcc value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98431373 0.98425197 0.98425197 0.98425197 0.98431373 0.98425197] mean value: 0.9850408285688307 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.79057455 0.77769208 0.76583242 0.77488422 0.77650523 0.7766695 0.78173661 0.77217269 0.76850343 0.77668691] mean value: 0.7761257648468017 key: score_time value: [0.0105288 0.00939512 0.00932217 0.0093441 0.00973582 0.00938821 0.00934148 0.00942039 0.00945306 0.00929689] mean value: 0.009522604942321777 key: test_mcc value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82195294 1. 0.92857143 0.89342711 0.96490128 0.89342711] mean value: 0.9259382487307398 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1. 0.96428571 0.94642857 0.98214286 0.94642857] mean value: 0.962719298245614 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.96428571 0.96551724 0.96666667 0.9122807 1. 0.96428571 0.94736842 0.98181818 0.94736842] mean value: 0.9631409244113418 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96428571 0.96551724 0.93548387 0.89655172 1. 0.96428571 0.93103448 1. 0.93103448] mean value: 0.9588193230573653 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1. 0.96428571 0.96428571 0.96428571 0.96428571] mean value: 0.9679802955665024 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1. 0.96428571 0.94642857 0.98214286 0.94642857] mean value: 0.9626231527093597 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.93103448 0.93333333 0.93548387 0.83870968 1. 0.93103448 0.9 0.96428571 0.9 ] mean value: 0.92981672758091 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.29 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03601861 0.03224778 0.03218007 0.03201056 0.03217316 0.03178501 0.03229713 0.03250289 0.05251861 0.08426547] mean value: 0.03979992866516113 key: score_time value: [0.0125165 0.01252484 0.01253128 0.01482654 0.01470113 0.01488376 0.01485562 0.01477528 0.01557207 0.02143931] mean value: 0.014862632751464844 key: test_mcc value: [ 0.33525717 -0.06746787 0.38590439 0.26729964 -0.06262243 0.10206207 0.14586499 0.10206207 0.53881591 0.18650096] mean value: 0.19336769075937818 key: train_mcc value: [0.67107707 0.44011793 0.61033709 0.4792439 0.40307741 0.37626192 0.35901099 0.52572037 0.9453509 0.35901099] mean value: 0.516920856933795 key: test_accuracy value: [0.63157895 0.47368421 0.66666667 0.61403509 0.48214286 0.53571429 0.55357143 0.53571429 0.76785714 0.57142857] mean value: 0.5832393483709273 key: train_accuracy value: [0.81065089 0.66272189 0.77120316 0.68639053 0.63976378 0.62401575 0.61417323 0.71653543 0.97244094 0.61417323] mean value: 0.7112068831632732 key: test_fscore value: [0.71232877 0.625 0.73972603 0.7027027 0.63291139 0.65789474 0.66666667 0.65789474 0.77966102 0.67567568] mean value: 0.685046172260402 key: train_fscore value: [0.8410596 0.74815906 0.81350482 0.76090226 0.73516643 0.7267525 0.72159091 0.7791411 0.97286822 0.72159091] mean value: 0.7820735807454069 key: test_precision value: [0.57777778 0.48076923 0.61363636 0.57777778 0.49019608 0.52083333 0.53191489 0.52083333 0.74193548 0.54347826] mean value: 0.5599152533416744 key: train_precision value: [0.72571429 0.59764706 0.68563686 0.61407767 0.5812357 0.57078652 0.56444444 0.63819095 0.95801527 0.56444444] mean value: 0.6500193196442058 key: test_recall value: [0.92857143 0.89285714 0.93103448 0.89655172 0.89285714 0.89285714 0.89285714 0.89285714 0.82142857 0.89285714] mean value: 0.893472906403941 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.98818898 1. ] mean value: 0.9988188976377953 key: test_roc_auc value: [0.63669951 0.48091133 0.66194581 0.60899015 0.48214286 0.53571429 0.55357143 0.53571429 0.76785714 0.57142857] mean value: 0.5834975369458129 key: train_roc_auc value: [0.81027668 0.66205534 0.77165354 0.68700787 0.63976378 0.62401575 0.61417323 0.71653543 0.97244094 0.61417323] mean value: 0.7112095795337836 key: test_jcc value: [0.55319149 0.45454545 0.58695652 0.54166667 0.46296296 0.49019608 0.5 0.49019608 0.63888889 0.51020408] mean value: 0.5228808222660204 key: train_jcc value: [0.72571429 0.59764706 0.68563686 0.61407767 0.5812357 0.57078652 0.56444444 0.63819095 0.94716981 0.56444444] mean value: 0.648934774058724 MCC on Blind test: -0.04 Accuracy on Blind test: 0.19 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02779937 0.02872729 0.01580143 0.01564527 0.01578498 0.04575109 0.04329133 0.04417109 0.03463864 0.02420235] mean value: 0.029581284523010253 key: score_time value: [0.02095222 0.01218772 0.01191568 0.01196671 0.01234722 0.01872563 0.02054381 0.02364755 0.02555585 0.0186007 ] mean value: 0.017644309997558595 key: test_mcc value: [0.92980296 0.8953202 0.82512315 0.82490815 0.75434227 0.78772636 0.71611487 0.78571429 0.71611487 0.82618439] mean value: 0.8061351507165199 key: train_mcc value: [0.86198955 0.86998617 0.85801653 0.86210547 0.88599845 0.85465533 0.86624915 0.87013943 0.87040934 0.86253233] mean value: 0.8662081754476578 key: test_accuracy value: [0.96491228 0.94736842 0.9122807 0.9122807 0.875 0.89285714 0.85714286 0.89285714 0.85714286 0.91071429] mean value: 0.9022556390977443 key: train_accuracy value: [0.93096647 0.93491124 0.92899408 0.93096647 0.94291339 0.92716535 0.93307087 0.93503937 0.93503937 0.93110236] mean value: 0.933016897296122 key: test_fscore value: [0.96428571 0.94736842 0.9122807 0.91525424 0.88135593 0.88888889 0.86206897 0.89285714 0.86206897 0.90566038] mean value: 0.9032089346723262 key: train_fscore value: [0.93150685 0.93567251 0.92913386 0.93150685 0.94346979 0.92815534 0.93359375 0.93542074 0.93592233 0.93203883] mean value: 0.9336420855587076 key: test_precision value: [0.96428571 0.93103448 0.92857143 0.9 0.83870968 0.92307692 0.83333333 0.89285714 0.83333333 0.96 ] mean value: 0.9005202035635851 key: train_precision value: [0.92607004 0.92664093 0.9254902 0.92248062 0.93436293 0.91570881 0.92635659 0.92996109 0.92337165 0.91954023] mean value: 0.924998308444446 key: test_recall value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.85714286 0.89285714 0.89285714 0.89285714 0.85714286] mean value: 0.9077586206896552 key: train_recall value: [0.93700787 0.94488189 0.93280632 0.94071146 0.95275591 0.94094488 0.94094488 0.94094488 0.9488189 0.94488189] mean value: 0.942469888892347 key: test_roc_auc value: [0.96490148 0.9476601 0.91256158 0.91194581 0.875 0.89285714 0.85714286 0.89285714 0.85714286 0.91071429] mean value: 0.9022783251231528 key: train_roc_auc value: [0.93095453 0.93489154 0.92900159 0.93098565 0.94291339 0.92716535 0.93307087 0.93503937 0.93503937 0.93110236] mean value: 0.9330164016059258 key: test_jcc value: [0.93103448 0.9 0.83870968 0.84375 0.78787879 0.8 0.75757576 0.80645161 0.75757576 0.82758621] mean value: 0.8250562283008056 key: train_jcc value: [0.87179487 0.87912088 0.86764706 0.87179487 0.89298893 0.86594203 0.87545788 0.87867647 0.87956204 0.87272727] mean value: 0.8755712302977963 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.18849206 0.29984403 0.31634831 0.33595371 0.3800807 0.31235695 0.30970716 0.2922895 0.28889799 0.29170918] mean value: 0.3015679597854614 key: score_time value: [0.01897597 0.01867318 0.02272344 0.01876187 0.02406073 0.01955128 0.01868868 0.01871037 0.01880717 0.01869726] mean value: 0.01976499557495117 key: test_mcc value: [0.92980296 0.8953202 0.85960591 0.82490815 0.75434227 0.78772636 0.71611487 0.78571429 0.71611487 0.82618439] mean value: 0.8095834265785888 key: train_mcc value: [0.86198955 0.86998617 0.88168563 0.86210547 0.90174953 0.85465533 0.86624915 0.87013943 0.89020543 0.86253233] mean value: 0.8721298027814565 key: test_accuracy value: [0.96491228 0.94736842 0.92982456 0.9122807 0.875 0.89285714 0.85714286 0.89285714 0.85714286 0.91071429] mean value: 0.9040100250626566 key: train_accuracy value: [0.93096647 0.93491124 0.9408284 0.93096647 0.9507874 0.92716535 0.93307087 0.93503937 0.94488189 0.93110236] mean value: 0.9359719827920918 key: test_fscore value: [0.96428571 0.94736842 0.93103448 0.91525424 0.88135593 0.88888889 0.86206897 0.89285714 0.86206897 0.90566038] mean value: 0.9050843127727497 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:135: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:138: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93150685 0.93567251 0.94094488 0.93150685 0.95126706 0.92815534 0.93359375 0.93542074 0.94573643 0.93203883] mean value: 0.9365843254175729 key: test_precision value: [0.96428571 0.93103448 0.93103448 0.9 0.83870968 0.92307692 0.83333333 0.89285714 0.83333333 0.96 ] mean value: 0.9007665089823044 key: train_precision value: [0.92607004 0.92664093 0.9372549 0.92248062 0.94208494 0.91570881 0.92635659 0.92996109 0.93129771 0.91954023] mean value: 0.9277395860462906 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.85714286 0.89285714 0.89285714 0.89285714 0.85714286] mean value: 0.9112068965517242 key: train_recall value: [0.93700787 0.94488189 0.94466403 0.94071146 0.96062992 0.94094488 0.94094488 0.94094488 0.96062992 0.94488189] mean value: 0.945624163580343 key: test_roc_auc value: [0.96490148 0.9476601 0.92980296 0.91194581 0.875 0.89285714 0.85714286 0.89285714 0.85714286 0.91071429] mean value: 0.9040024630541872 key: train_roc_auc value: [0.93095453 0.93489154 0.94083595 0.93098565 0.9507874 0.92716535 0.93307087 0.93503937 0.94488189 0.93110236] mean value: 0.9359714917058293 key: test_jcc value: [0.93103448 0.9 0.87096774 0.84375 0.78787879 0.8 0.75757576 0.80645161 0.75757576 0.82758621] mean value: 0.8282820347524185 key: train_jcc value: [0.87179487 0.87912088 0.88847584 0.87179487 0.9070632 0.86594203 0.87545788 0.87867647 0.89705882 0.87272727] mean value: 0.8808112127456175 MCC on Blind test: 0.22 Accuracy on Blind test: 0.68 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0371778 0.03735137 0.03989172 0.03771639 0.0379281 0.03412724 0.0368371 0.03751278 0.03696871 0.0387764 ] mean value: 0.037428760528564455 key: score_time value: [0.01196814 0.01196027 0.01489449 0.01493311 0.01487374 0.01206851 0.01477408 0.01476097 0.01480389 0.01474452] mean value: 0.013978171348571777 key: test_mcc value: [0.92980296 0.86189955 0.79161589 0.79110556 0.71611487 0.8660254 0.64450339 0.82195294 0.75047877 0.85933785] mean value: 0.8032837186777093 key: train_mcc value: [0.84631191 0.86587719 0.86598917 0.87771934 0.87412415 0.84262418 0.85042006 0.86614173 0.87828635 0.86237183] mean value: 0.8629865906620574 key: test_accuracy value: [0.96491228 0.92982456 0.89473684 0.89473684 0.85714286 0.92857143 0.82142857 0.91071429 0.875 0.92857143] mean value: 0.9005639097744361 key: train_accuracy value: [0.92307692 0.93293886 0.93293886 0.93885602 0.93700787 0.92125984 0.92519685 0.93307087 0.93897638 0.93110236] mean value: 0.9314424824115921 key: test_fscore value: [0.96428571 0.93103448 0.89285714 0.9 0.86206897 0.92307692 0.82758621 0.9122807 0.87719298 0.92592593] mean value: 0.9016309045528647 key: train_fscore value: [0.92397661 0.93307087 0.93333333 0.93885602 0.9375 0.921875 0.9254902 0.93307087 0.93980583 0.93177388] mean value: 0.9318752590046475 key: test_precision value: [0.96428571 0.9 0.92592593 0.87096774 0.83333333 1. 0.8 0.89655172 0.86206897 0.96153846] mean value: 0.9014671866674091 key: train_precision value: [0.91505792 0.93307087 0.92607004 0.93700787 0.93023256 0.91472868 0.921875 0.93307087 0.92720307 0.92277992] mean value: 0.9261096788491734 key: test_recall value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.85714286 0.85714286 0.92857143 0.89285714 0.89285714] mean value: 0.9043103448275862 key: train_recall value: [0.93307087 0.93307087 0.94071146 0.94071146 0.94488189 0.92913386 0.92913386 0.93307087 0.95275591 0.94094488] mean value: 0.937748591702717 key: test_roc_auc value: [0.96490148 0.93041872 0.8953202 0.89408867 0.85714286 0.92857143 0.82142857 0.91071429 0.875 0.92857143] mean value: 0.9006157635467981 key: train_roc_auc value: [0.92305717 0.9329386 0.93295416 0.93885967 0.93700787 0.92125984 0.92519685 0.93307087 0.93897638 0.93110236] mean value: 0.9314423765211167 key: test_jcc value: [0.93103448 0.87096774 0.80645161 0.81818182 0.75757576 0.85714286 0.70588235 0.83870968 0.78125 0.86206897] mean value: 0.8229265266375536 key: train_jcc value: [0.85869565 0.87453875 0.875 0.88475836 0.88235294 0.85507246 0.86131387 0.87453875 0.88644689 0.87226277] mean value: 0.8724980440988328 MCC on Blind test: 0.31 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.89304805 1.03817892 0.89372754 1.10189581 0.90773821 0.89544559 1.06514525 0.90930581 1.09472632 0.9275372 ] mean value: 0.9726748704910279 key: score_time value: [0.01456618 0.01514888 0.01538634 0.01519942 0.01540756 0.0152154 0.01212883 0.01528525 0.01524878 0.01212168] mean value: 0.014570832252502441 key: test_mcc value: [0.92980296 0.82512315 0.82512315 0.75462449 0.71611487 0.89802651 0.60753044 0.82195294 0.75047877 0.85933785] mean value: 0.7988115143025961 key: train_mcc value: [0.88954592 0.8974355 0.90138653 0.94089268 0.90158179 0.94112724 0.8307151 0.89766562 0.90191737 0.81892302] mean value: 0.8921190789888661 key: test_accuracy value: [0.96491228 0.9122807 0.9122807 0.87719298 0.85714286 0.94642857 0.80357143 0.91071429 0.875 0.92857143] mean value: 0.8988095238095238 key: train_accuracy value: [0.94477318 0.94871795 0.95069034 0.9704142 0.9507874 0.97047244 0.91535433 0.9488189 0.9507874 0.90944882] mean value: 0.946026495208809 key: test_fscore value: [0.96428571 0.9122807 0.9122807 0.88135593 0.86206897 0.94339623 0.80701754 0.9122807 0.87719298 0.92592593] mean value: 0.8998085395926314 key: train_fscore value: [0.94488189 0.9488189 0.95049505 0.97017893 0.95088409 0.97017893 0.91552063 0.94901961 0.95145631 0.90980392] mean value: 0.9461238245008307 key: test_precision value: [0.96428571 0.89655172 0.92857143 0.86666667 0.83333333 1. 0.79310345 0.89655172 0.86206897 0.96153846] mean value: 0.900267146646457 key: train_precision value: [0.94488189 0.9488189 0.95238095 0.976 0.94901961 0.97991968 0.91372549 0.9453125 0.93869732 0.90625 ] mean value: 0.9455006334544265 key: test_recall value: [0.96428571 0.92857143 0.89655172 0.89655172 0.89285714 0.89285714 0.82142857 0.92857143 0.89285714 0.89285714] mean value: 0.9007389162561577 key: train_recall value: [0.94488189 0.9488189 0.9486166 0.96442688 0.95275591 0.96062992 0.91732283 0.95275591 0.96456693 0.91338583] mean value: 0.946816158849709 key: test_roc_auc value: [0.96490148 0.91256158 0.91256158 0.87684729 0.85714286 0.94642857 0.80357143 0.91071429 0.875 0.92857143] mean value: 0.8988300492610838 key: train_roc_auc value: [0.94477296 0.94871775 0.95068625 0.97040242 0.9507874 0.97047244 0.91535433 0.9488189 0.9507874 0.90944882] mean value: 0.9460248669509197 key: test_jcc value: [0.93103448 0.83870968 0.83870968 0.78787879 0.75757576 0.89285714 0.67647059 0.83870968 0.78125 0.86206897] mean value: 0.8205264757080909 key: train_jcc value: [0.89552239 0.90262172 0.90566038 0.94208494 0.90636704 0.94208494 0.8442029 0.90298507 0.90740741 0.83453237] mean value: 0.8983469168318737 MCC on Blind test: 0.24 Accuracy on Blind test: 0.64 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01492214 0.010638 0.01045823 0.01021338 0.01002026 0.01002669 0.01005268 0.01020908 0.00999951 0.01019502] mean value: 0.01067349910736084 key: score_time value: [0.01205015 0.00935721 0.0090673 0.00890112 0.00881839 0.00884724 0.00882244 0.00877309 0.00878334 0.00903559] mean value: 0.009245586395263673 key: test_mcc value: [0.8615634 0.55091314 0.72706729 0.62473685 0.4330127 0.47951222 0.79385662 0.50128041 0.54446551 0.75434227] mean value: 0.6270750410241596 key: train_mcc value: [0.66657128 0.67011432 0.64146859 0.66742134 0.63711603 0.67419557 0.66977469 0.70816293 0.65740182 0.67253825] mean value: 0.6664764817127335 key: test_accuracy value: [0.92982456 0.77192982 0.85964912 0.80701754 0.71428571 0.73214286 0.89285714 0.75 0.76785714 0.875 ] mean value: 0.8100563909774436 key: train_accuracy value: [0.82840237 0.83037475 0.81656805 0.83037475 0.80511811 0.83267717 0.83070866 0.8523622 0.82480315 0.83070866] mean value: 0.8282097873860442 key: test_fscore value: [0.92592593 0.74509804 0.85185185 0.79245283 0.69230769 0.69387755 0.88461538 0.74074074 0.74509804 0.86792453] mean value: 0.7939892583383943 key: train_fscore value: [0.81290323 0.81545064 0.8 0.81702128 0.77241379 0.81798715 0.81623932 0.8447205 0.81023454 0.81385281] mean value: 0.8120823259881095 key: test_precision value: [0.96153846 0.82608696 0.92 0.875 0.75 0.80952381 0.95833333 0.76923077 0.82608696 0.92 ] mean value: 0.8615800286669852 key: train_precision value: [0.8957346 0.89622642 0.87735849 0.88479263 0.9281768 0.89671362 0.89252336 0.89082969 0.88372093 0.90384615] mean value: 0.8949922683036309 key: test_recall value: [0.89285714 0.67857143 0.79310345 0.72413793 0.64285714 0.60714286 0.82142857 0.71428571 0.67857143 0.82142857] mean value: 0.7374384236453202 key: train_recall value: [0.74409449 0.7480315 0.73517787 0.75889328 0.66141732 0.7519685 0.7519685 0.80314961 0.7480315 0.74015748] mean value: 0.7442890043882855 key: test_roc_auc value: [0.92918719 0.7703202 0.86083744 0.80849754 0.71428571 0.73214286 0.89285714 0.75 0.76785714 0.875 ] mean value: 0.8100985221674877 key: train_roc_auc value: [0.82856898 0.83053749 0.81640783 0.83023404 0.80511811 0.83267717 0.83070866 0.8523622 0.82480315 0.83070866] mean value: 0.8282126295477887 key: test_jcc value: [0.86206897 0.59375 0.74193548 0.65625 0.52941176 0.53125 0.79310345 0.58823529 0.59375 0.76666667] mean value: 0.6656421623154267 key: train_jcc value: [0.68478261 0.6884058 0.66666667 0.69064748 0.62921348 0.69202899 0.68953069 0.7311828 0.68100358 0.68613139] mean value: 0.6839593475841678 MCC on Blind test: 0.33 Accuracy on Blind test: 0.75 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01052785 0.01058602 0.01057577 0.01063657 0.01053834 0.01070213 0.01048851 0.01046777 0.01058173 0.0106101 ] mean value: 0.010571479797363281 key: score_time value: [0.0089097 0.00898433 0.00901771 0.00898981 0.0089891 0.00899673 0.00903559 0.00891829 0.00887704 0.00920439] mean value: 0.008992266654968262 key: test_mcc value: [0.8953202 0.7589669 0.68472906 0.58562417 0.64450339 0.75047877 0.64285714 0.71428571 0.53881591 0.75047877] mean value: 0.6966060026275183 key: train_mcc value: [0.74761876 0.73570695 0.73177298 0.73972796 0.74414639 0.74805469 0.75211424 0.7480315 0.74812427 0.7486119 ] mean value: 0.7443909629951391 key: test_accuracy value: [0.94736842 0.87719298 0.84210526 0.78947368 0.82142857 0.875 0.82142857 0.85714286 0.76785714 0.875 ] mean value: 0.8473997493734335 key: train_accuracy value: [0.87376726 0.8678501 0.86587771 0.86982249 0.87204724 0.87401575 0.87598425 0.87401575 0.87401575 0.87401575] mean value: 0.8721412042429607 key: test_fscore value: [0.94736842 0.88135593 0.84210526 0.77777778 0.82758621 0.87719298 0.82142857 0.85714286 0.77966102 0.87719298] mean value: 0.8488812011521107 key: train_fscore value: [0.875 0.8678501 0.86507937 0.8685259 0.87128713 0.87351779 0.87475149 0.87401575 0.87301587 0.87644788] mean value: 0.8719491263936097 key: test_precision value: [0.93103448 0.83870968 0.85714286 0.84 0.8 0.86206897 0.82142857 0.85714286 0.74193548 0.86206897] mean value: 0.8411531860797712 key: train_precision value: [0.86821705 0.86956522 0.8685259 0.87550201 0.87649402 0.87698413 0.88353414 0.87401575 0.88 0.85984848] mean value: 0.8732686696416017 key: test_recall value: [0.96428571 0.92857143 0.82758621 0.72413793 0.85714286 0.89285714 0.82142857 0.85714286 0.82142857 0.89285714] mean value: 0.858743842364532 key: train_recall value: [0.88188976 0.86614173 0.86166008 0.86166008 0.86614173 0.87007874 0.86614173 0.87401575 0.86614173 0.89370079] mean value: 0.8707572126606704 key: test_roc_auc value: [0.9476601 0.87807882 0.84236453 0.79064039 0.82142857 0.875 0.82142857 0.85714286 0.76785714 0.875 ] mean value: 0.8476600985221675 key: train_roc_auc value: [0.87375121 0.86785347 0.86586941 0.86980642 0.87204724 0.87401575 0.87598425 0.87401575 0.87401575 0.87401575] mean value: 0.8721374996109676 key: test_jcc value: [0.9 0.78787879 0.72727273 0.63636364 0.70588235 0.78125 0.6969697 0.75 0.63888889 0.78125 ] mean value: 0.7405756090314914 key: train_jcc value: [0.77777778 0.76655052 0.76223776 0.76760563 0.77192982 0.7754386 0.77738516 0.77622378 0.77464789 0.78006873] mean value: 0.7729865668599729 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00995111 0.0098927 0.00990295 0.01049066 0.01074862 0.01070476 0.01121283 0.01361084 0.01100898 0.01114106] mean value: 0.010866451263427734 key: score_time value: [0.0171814 0.01235557 0.0125854 0.01293373 0.01254654 0.01322293 0.01275516 0.01698422 0.01724505 0.01798248] mean value: 0.014579248428344727 key: test_mcc value: [0.69397486 0.40447771 0.34042547 0.54592083 0.42966892 0.57735027 0.42857143 0.50518149 0.57142857 0.67900461] mean value: 0.5176004157137253 key: train_mcc value: [0.68074909 0.71607321 0.66913289 0.70418327 0.70921127 0.68153033 0.71326761 0.68114987 0.67411185 0.67887215] mean value: 0.6908281541769783 key: test_accuracy value: [0.84210526 0.70175439 0.66666667 0.77192982 0.71428571 0.78571429 0.71428571 0.75 0.78571429 0.83928571] mean value: 0.7571741854636591 key: train_accuracy value: [0.84023669 0.85798817 0.83431953 0.85207101 0.85433071 0.84055118 0.85629921 0.84055118 0.83661417 0.83858268] mean value: 0.8451544518473653 key: test_fscore value: [0.82352941 0.67924528 0.64150943 0.78688525 0.7037037 0.76923077 0.71428571 0.73076923 0.78571429 0.83636364] mean value: 0.7471236714714817 key: train_fscore value: [0.83832335 0.85714286 0.83064516 0.85089463 0.85140562 0.83767535 0.85311871 0.83960396 0.83232323 0.83265306] mean value: 0.8423785943342119 key: test_precision value: [0.91304348 0.72 0.70833333 0.75 0.73076923 0.83333333 0.71428571 0.79166667 0.78571429 0.85185185] mean value: 0.7798997894215285 key: train_precision value: [0.85020243 0.864 0.84773663 0.856 0.86885246 0.85306122 0.87242798 0.84462151 0.85477178 0.86440678] mean value: 0.857608079954709 key: test_recall value: [0.75 0.64285714 0.5862069 0.82758621 0.67857143 0.71428571 0.71428571 0.67857143 0.78571429 0.82142857] mean value: 0.7199507389162562 key: train_recall value: [0.82677165 0.8503937 0.81422925 0.8458498 0.83464567 0.82283465 0.83464567 0.83464567 0.81102362 0.80314961] mean value: 0.8278189287603872 key: test_roc_auc value: [0.84051724 0.70073892 0.66810345 0.77093596 0.71428571 0.78571429 0.71428571 0.75 0.78571429 0.83928571] mean value: 0.7569581280788177 key: train_roc_auc value: [0.8402633 0.85800317 0.83427998 0.85205876 0.85433071 0.84055118 0.85629921 0.84055118 0.83661417 0.83858268] mean value: 0.8451534343780149 key: test_jcc value: [0.7 0.51428571 0.47222222 0.64864865 0.54285714 0.625 0.55555556 0.57575758 0.64705882 0.71875 ] mean value: 0.6000135682856271 key: train_jcc value: [0.72164948 0.75 0.71034483 0.74048443 0.74125874 0.72068966 0.74385965 0.72354949 0.71280277 0.71328671] mean value: 0.7277925756249406 MCC on Blind test: 0.24 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02651811 0.02431774 0.02231216 0.02248645 0.02474499 0.02260637 0.02188492 0.02576017 0.02266097 0.02236414] mean value: 0.02356560230255127 key: score_time value: [0.01297498 0.01320481 0.01189494 0.01260996 0.0117631 0.01185036 0.01195765 0.01315618 0.01193786 0.01203465] mean value: 0.01233844757080078 key: test_mcc value: [0.8953202 0.8953202 0.85960591 0.75462449 0.75047877 0.78772636 0.64285714 0.71611487 0.67900461 0.85933785] mean value: 0.7840390407141125 key: train_mcc value: [0.77909184 0.79489255 0.78304441 0.80278863 0.80714291 0.79139378 0.80709287 0.79926835 0.81501748 0.79149195] mean value: 0.7971224773953642 key: test_accuracy value: [0.94736842 0.94736842 0.92982456 0.87719298 0.875 0.89285714 0.82142857 0.85714286 0.83928571 0.92857143] mean value: 0.8916040100250626 key: train_accuracy value: [0.88954635 0.8974359 0.89151874 0.90138067 0.90354331 0.89566929 0.90354331 0.8996063 0.90748031 0.89566929] mean value: 0.8985393467828355 key: test_fscore value: [0.94736842 0.94736842 0.93103448 0.88135593 0.87719298 0.88888889 0.82142857 0.85185185 0.84210526 0.92592593] mean value: 0.8914520740776547 key: train_fscore value: [0.88976378 0.89803922 0.89151874 0.9015748 0.90410959 0.8962818 0.90373281 0.90019569 0.90802348 0.89668616] mean value: 0.8989926072825011 key: test_precision value: [0.93103448 0.93103448 0.93103448 0.86666667 0.86206897 0.92307692 0.82142857 0.88461538 0.82758621 0.96153846] mean value: 0.8940084628015662 key: train_precision value: [0.88976378 0.89453125 0.88976378 0.89803922 0.89883268 0.89105058 0.90196078 0.89494163 0.90272374 0.88803089] mean value: 0.8949638335218302 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286 0.82142857 0.82142857 0.85714286 0.89285714] mean value: 0.8899014778325123 key: train_recall value: [0.88976378 0.9015748 0.89328063 0.90513834 0.90944882 0.9015748 0.90551181 0.90551181 0.91338583 0.90551181] mean value: 0.9030702436898945 key: test_roc_auc value: [0.9476601 0.9476601 0.92980296 0.87684729 0.875 0.89285714 0.82142857 0.85714286 0.83928571 0.92857143] mean value: 0.8916256157635468 key: train_roc_auc value: [0.88954592 0.89742772 0.89152221 0.90138807 0.90354331 0.89566929 0.90354331 0.8996063 0.90748031 0.89566929] mean value: 0.8985395723755875 key: test_jcc value: [0.9 0.9 0.87096774 0.78787879 0.78125 0.8 0.6969697 0.74193548 0.72727273 0.86206897] mean value: 0.8068343403444905 key: train_jcc value: [0.80141844 0.81494662 0.80427046 0.82078853 0.825 0.81205674 0.82437276 0.81850534 0.83154122 0.81272085] mean value: 0.8165620954250901 MCC on Blind test: 0.23 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.59886169 1.92170119 2.23752618 2.16796374 2.11638856 2.47973561 2.14198256 3.86806488 3.29336166 2.04994941] mean value: 2.3875535488128663 key: score_time value: [0.01230478 0.01873446 0.01443458 0.02749872 0.01352763 0.01245713 0.0131278 0.01426792 0.01233697 0.01281023] mean value: 0.015150022506713868 key: test_mcc value: [0.82512315 0.72064772 0.82490815 0.7257422 0.67900461 0.85933785 0.75434227 0.78571429 0.68250015 0.78772636] mean value: 0.7645046746713943 key: train_mcc value: [0.98425172 0.99214142 0.99211042 0.99606299 0.96853396 0.99215674 0.98819663 0.99607071 0.98425197 0.98819663] mean value: 0.9881973215872323 key: test_accuracy value: [0.9122807 0.85964912 0.9122807 0.85964912 0.83928571 0.92857143 0.875 0.89285714 0.83928571 0.89285714] mean value: 0.881171679197995 key: train_accuracy value: [0.99211045 0.99605523 0.99605523 0.99802761 0.98425197 0.99606299 0.99409449 0.9980315 0.99212598 0.99409449] mean value: 0.9940909938032894 key: test_fscore value: [0.9122807 0.85185185 0.91525424 0.87096774 0.84210526 0.92592593 0.88135593 0.89285714 0.84745763 0.88888889] mean value: 0.8828945312981744 key: train_fscore value: [0.99209486 0.99604743 0.99604743 0.99802761 0.98418972 0.99604743 0.99408284 0.99803536 0.99212598 0.99410609] mean value: 0.994080476920228 key: test_precision value: [0.89655172 0.88461538 0.9 0.81818182 0.82758621 0.96153846 0.83870968 0.89285714 0.80645161 0.92307692] mean value: 0.8749568951626794 key: train_precision value: [0.99603175 1. 0.99604743 0.99606299 0.98809524 1. 0.99604743 0.99607843 0.99212598 0.99215686] mean value: 0.9952646116282663 key: test_recall value: [0.92857143 0.82142857 0.93103448 0.93103448 0.85714286 0.89285714 0.92857143 0.89285714 0.89285714 0.85714286] mean value: 0.8933497536945813 key: train_recall value: [0.98818898 0.99212598 0.99604743 1. 0.98031496 0.99212598 0.99212598 1. 0.99212598 0.99606299] mean value: 0.9929118296971772 key: test_roc_auc value: [0.91256158 0.85899015 0.91194581 0.85837438 0.83928571 0.92857143 0.875 0.89285714 0.83928571 0.89285714] mean value: 0.8809729064039409 key: train_roc_auc value: [0.9921182 0.99606299 0.99605521 0.9980315 0.98425197 0.99606299 0.99409449 0.9980315 0.99212598 0.99409449] mean value: 0.9940929320593819 key: test_jcc value: [0.83870968 0.74193548 0.84375 0.77142857 0.72727273 0.86206897 0.78787879 0.80645161 0.73529412 0.8 ] mean value: 0.7914789943937935 key: train_jcc value: [0.98431373 0.99212598 0.99212598 0.99606299 0.9688716 0.99212598 0.98823529 0.99607843 0.984375 0.98828125] mean value: 0.9882596241193021 MCC on Blind test: 0.25 Accuracy on Blind test: 0.67 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03135872 0.02370524 0.02240443 0.02254367 0.02118802 0.02509332 0.02626276 0.02403593 0.02359962 0.02351856] mean value: 0.024371027946472168 key: score_time value: [0.01250744 0.01007581 0.00941324 0.00972056 0.00988364 0.00994086 0.00985217 0.00987315 0.0091753 0.00908613] mean value: 0.009952831268310546 key: test_mcc value: [0.96547546 0.82512315 0.86189955 0.92980296 0.85714286 0.92857143 0.82195294 0.85933785 0.93094934 0.85933785] mean value: 0.883959337340627 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.9122807 0.92982456 0.96491228 0.92857143 0.96428571 0.91071429 0.92857143 0.96428571 0.92857143] mean value: 0.9414473684210526 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.9122807 0.92857143 0.96551724 0.92857143 0.96428571 0.90909091 0.93103448 0.96296296 0.93103448] mean value: 0.9415167533951563 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.89655172 0.96296296 0.96551724 0.92857143 0.96428571 0.92592593 0.9 1. 0.9 ] mean value: 0.9443814997263273 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.92857143 0.89655172 0.96551724 0.92857143 0.96428571 0.89285714 0.96428571 0.92857143 0.96428571] mean value: 0.9397783251231527 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.91256158 0.93041872 0.96490148 0.92857143 0.96428571 0.91071429 0.92857143 0.96428571 0.92857143] mean value: 0.9415024630541873 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.83870968 0.86666667 0.93333333 0.86666667 0.93103448 0.83333333 0.87096774 0.92857143 0.87096774] mean value: 0.8904536786906087 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.47 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.14206529 0.13289452 0.14049268 0.12361646 0.12473321 0.12784219 0.12505698 0.12953353 0.14142609 0.14261723] mean value: 0.13302781581878662 key: score_time value: [0.01912665 0.01963639 0.01777506 0.01807332 0.01823997 0.01803923 0.01796126 0.01868534 0.02039599 0.01962233] mean value: 0.018755555152893066 key: test_mcc value: [0.92980296 0.64901478 0.82490815 0.75462449 0.82195294 0.89342711 0.85933785 0.82195294 0.78571429 0.82195294] mean value: 0.8162688454469251 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.8245614 0.9122807 0.87719298 0.91071429 0.94642857 0.92857143 0.91071429 0.89285714 0.91071429] mean value: 0.9078947368421053 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.82142857 0.91525424 0.88135593 0.9122807 0.94736842 0.93103448 0.9122807 0.89285714 0.90909091] mean value: 0.9087236814473888 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96428571 0.82142857 0.9 0.86666667 0.89655172 0.93103448 0.9 0.89655172 0.89285714 0.92592593] mean value: 0.8995301952198504 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.82142857 0.93103448 0.89655172 0.92857143 0.96428571 0.96428571 0.92857143 0.89285714 0.89285714] mean value: 0.9184729064039409 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96490148 0.82450739 0.91194581 0.87684729 0.91071429 0.94642857 0.92857143 0.91071429 0.89285714 0.91071429] mean value: 0.907820197044335 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.6969697 0.84375 0.78787879 0.83870968 0.9 0.87096774 0.83870968 0.80645161 0.83333333] mean value: 0.8347805010617858 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.31 Accuracy on Blind test: 0.72 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01085973 0.01103044 0.01346946 0.01193619 0.01305199 0.01162624 0.01204753 0.01230168 0.01173186 0.01169991] mean value: 0.011975502967834473 key: score_time value: [0.01012802 0.00980639 0.01152587 0.00962973 0.00939035 0.00966001 0.00961351 0.01386833 0.0105443 0.00907779] mean value: 0.010324430465698243 key: test_mcc value: [0.57881773 0.54759338 0.75462449 0.50927421 0.4645821 0.71611487 0.50128041 0.57142857 0.53605627 0.39310793] mean value: 0.5572879969415938 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.77192982 0.87719298 0.75438596 0.73214286 0.85714286 0.75 0.78571429 0.76785714 0.69642857] mean value: 0.7782268170426065 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.78571429 0.77966102 0.88135593 0.76666667 0.73684211 0.86206897 0.75862069 0.78571429 0.76363636 0.69090909] mean value: 0.7811189402228806 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.78571429 0.74193548 0.86666667 0.74193548 0.72413793 0.83333333 0.73333333 0.78571429 0.77777778 0.7037037 ] mean value: 0.7694252285019805 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.78571429 0.82142857 0.89655172 0.79310345 0.75 0.89285714 0.78571429 0.78571429 0.75 0.67857143] mean value: 0.7939655172413793 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78940887 0.77278325 0.87684729 0.75369458 0.73214286 0.85714286 0.75 0.78571429 0.76785714 0.69642857] mean value: 0.7782019704433497 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64705882 0.63888889 0.78787879 0.62162162 0.58333333 0.75757576 0.61111111 0.64705882 0.61764706 0.52777778] mean value: 0.6439951984069632 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.2 Accuracy on Blind test: 0.68 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.97127247 1.87376833 1.90832615 1.96241379 1.86430383 1.85158372 1.85320497 1.85398388 1.91005635 1.90927792] mean value: 1.8958191394805908 key: score_time value: [0.0927887 0.09511352 0.10197377 0.101542 0.09229207 0.09235573 0.09270048 0.09604359 0.09203935 0.09217215] mean value: 0.0949021339416504 key: test_mcc value: [0.96547546 0.8953202 0.8953202 0.82512315 0.85714286 1. 0.96490128 0.89342711 0.93094934 0.85933785] mean value: 0.9086997439278497 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.94736842 0.94736842 0.9122807 0.92857143 1. 0.98214286 0.94642857 0.96428571 0.92857143] mean value: 0.9539473684210527 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.94736842 0.94736842 0.9122807 0.92857143 1. 0.98245614 0.94736842 0.96296296 0.93103448] mean value: 0.9541229161374352 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93103448 0.96428571 0.92857143 0.92857143 1. 0.96551724 0.93103448 1. 0.9 ] mean value: 0.9549014778325123 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.89655172 0.92857143 1. 1. 0.96428571 0.92857143 0.96428571] mean value: 0.9541871921182266 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.9476601 0.9476601 0.91256158 0.92857143 1. 0.98214286 0.94642857 0.96428571 0.92857143] mean value: 0.9540024630541872 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.9 0.9 0.83870968 0.86666667 1. 0.96551724 0.9 0.92857143 0.87096774] mean value: 0.9134718470257959 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.6 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.93197846 0.98866272 1.00972748 0.9597013 0.97821736 0.96020889 0.97179317 1.03200746 0.92861557 1.0140574 ] mean value: 0.9774969816207886 key: score_time value: [0.16036201 0.25168538 0.22374582 0.18213248 0.22768426 0.23600006 0.21665478 0.25347424 0.24868369 0.23307705] mean value: 0.2233499765396118 key: test_mcc value: [0.96547546 0.86189955 0.8953202 0.82512315 0.85714286 0.93094934 0.92857143 0.85714286 0.93094934 0.89342711] mean value: 0.8946001279474248 key: train_mcc value: [0.94477296 0.96055211 0.94872473 0.96055211 0.9606597 0.95675965 0.94882625 0.95670033 0.94882625 0.95675965] mean value: 0.9543133754242162 key: test_accuracy value: [0.98245614 0.92982456 0.94736842 0.9122807 0.92857143 0.96428571 0.96428571 0.92857143 0.96428571 0.94642857] mean value: 0.9468358395989975 key: train_accuracy value: [0.97238659 0.98027613 0.97435897 0.98027613 0.98031496 0.97834646 0.97440945 0.97834646 0.97440945 0.97834646] mean value: 0.9771471058721211 key: test_fscore value: [0.98181818 0.93103448 0.94736842 0.9122807 0.92857143 0.96551724 0.96428571 0.92857143 0.96296296 0.94736842] mean value: 0.9469778984207297 key: train_fscore value: [0.97244094 0.98031496 0.97425743 0.98023715 0.98039216 0.97847358 0.97445972 0.97830375 0.97445972 0.97847358] mean value: 0.9771813002130227 key: test_precision value: [1. 0.9 0.96428571 0.92857143 0.92857143 0.93333333 0.96428571 0.92857143 1. 0.93103448] mean value: 0.9478653530377669 key: train_precision value: [0.97244094 0.98031496 0.97619048 0.98023715 0.9765625 0.97276265 0.97254902 0.98023715 0.97254902 0.97276265] mean value: 0.9756606521047162 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.89655172 0.92857143 1. 0.96428571 0.92857143 0.92857143 0.96428571] mean value: 0.9470443349753694 key: train_recall value: [0.97244094 0.98031496 0.97233202 0.98023715 0.98425197 0.98425197 0.97637795 0.97637795 0.97637795 0.98425197] mean value: 0.9787214839251813 key: test_roc_auc value: [0.98214286 0.93041872 0.9476601 0.91256158 0.92857143 0.96428571 0.96428571 0.92857143 0.96428571 0.94642857] mean value: 0.9469211822660099 key: train_roc_auc value: [0.97238648 0.98027606 0.97435498 0.98027606 0.98031496 0.97834646 0.97440945 0.97834646 0.97440945 0.97834646] mean value: 0.9771466807755751 key: test_jcc value: [0.96428571 0.87096774 0.9 0.83870968 0.86666667 0.93333333 0.93103448 0.86666667 0.92857143 0.9 ] mean value: 0.9000235711637269 key: train_jcc value: [0.94636015 0.96138996 0.94980695 0.96124031 0.96153846 0.95785441 0.95019157 0.95752896 0.95019157 0.95785441] mean value: 0.9553956747621544 MCC on Blind test: 0.25 Accuracy on Blind test: 0.59 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0273037 0.01028109 0.01023984 0.01031065 0.01022434 0.01032615 0.01025367 0.01148105 0.01173735 0.01176667] mean value: 0.012392449378967284 key: score_time value: [0.00985813 0.00880551 0.00889325 0.00906706 0.00880718 0.00885367 0.009377 0.00965214 0.0097661 0.00972247] mean value: 0.009280252456665038 key: test_mcc value: [0.8953202 0.7589669 0.68472906 0.58562417 0.64450339 0.75047877 0.64285714 0.71428571 0.53881591 0.75047877] mean value: 0.6966060026275183 key: train_mcc value: [0.74761876 0.73570695 0.73177298 0.73972796 0.74414639 0.74805469 0.75211424 0.7480315 0.74812427 0.7486119 ] mean value: 0.7443909629951391 key: test_accuracy value: [0.94736842 0.87719298 0.84210526 0.78947368 0.82142857 0.875 0.82142857 0.85714286 0.76785714 0.875 ] mean value: 0.8473997493734335 key: train_accuracy value: [0.87376726 0.8678501 0.86587771 0.86982249 0.87204724 0.87401575 0.87598425 0.87401575 0.87401575 0.87401575] mean value: 0.8721412042429607 key: test_fscore value: [0.94736842 0.88135593 0.84210526 0.77777778 0.82758621 0.87719298 0.82142857 0.85714286 0.77966102 0.87719298] mean value: 0.8488812011521107 key: train_fscore value: [0.875 0.8678501 0.86507937 0.8685259 0.87128713 0.87351779 0.87475149 0.87401575 0.87301587 0.87644788] mean value: 0.8719491263936097 key: test_precision value: [0.93103448 0.83870968 0.85714286 0.84 0.8 0.86206897 0.82142857 0.85714286 0.74193548 0.86206897] mean value: 0.8411531860797712 key: train_precision value: [0.86821705 0.86956522 0.8685259 0.87550201 0.87649402 0.87698413 0.88353414 0.87401575 0.88 0.85984848] mean value: 0.8732686696416017 key: test_recall value: [0.96428571 0.92857143 0.82758621 0.72413793 0.85714286 0.89285714 0.82142857 0.85714286 0.82142857 0.89285714] mean value: 0.858743842364532 key: train_recall value: [0.88188976 0.86614173 0.86166008 0.86166008 0.86614173 0.87007874 0.86614173 0.87401575 0.86614173 0.89370079] mean value: 0.8707572126606704 key: test_roc_auc value: [0.9476601 0.87807882 0.84236453 0.79064039 0.82142857 0.875 0.82142857 0.85714286 0.76785714 0.875 ] mean value: 0.8476600985221675 key: train_roc_auc value: [0.87375121 0.86785347 0.86586941 0.86980642 0.87204724 0.87401575 0.87598425 0.87401575 0.87401575 0.87401575] mean value: 0.8721374996109676 key: test_jcc value: [0.9 0.78787879 0.72727273 0.63636364 0.70588235 0.78125 0.6969697 0.75 0.63888889 0.78125 ] mean value: 0.7405756090314914 key: train_jcc value: [0.77777778 0.76655052 0.76223776 0.76760563 0.77192982 0.7754386 0.77738516 0.77622378 0.77464789 0.78006873] mean value: 0.7729865668599729 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08733106 0.0808742 0.07191372 0.07638454 0.07736945 0.07642531 0.08206582 0.07636857 0.07126498 0.07357764] mean value: 0.07735753059387207 key: score_time value: [0.01076412 0.01088524 0.01092649 0.01091504 0.01091599 0.01097631 0.01111364 0.0110476 0.01095295 0.01090598] mean value: 0.010940337181091308 key: test_mcc value: [0.96547546 0.8953202 0.92980296 0.93202124 0.85714286 1. 0.96490128 0.89342711 0.93094934 0.92857143] mean value: 0.9297611866340357 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.94736842 0.96491228 0.96491228 0.92857143 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.9645363408521304 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.94736842 0.96551724 0.96666667 0.92857143 1. 0.98245614 0.94736842 0.96296296 0.96428571] mean value: 0.9647015178140406 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93103448 0.96551724 0.93548387 0.92857143 1. 0.96551724 0.93103448 1. 0.96428571] mean value: 0.9621444462100747 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1. 1. 0.96428571 0.92857143 0.96428571] mean value: 0.9679802955665024 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.9476601 0.96490148 0.96428571 0.92857143 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.9644704433497537 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.9 0.93333333 0.93548387 0.86666667 1. 0.96551724 0.9 0.92857143 0.93103448] mean value: 0.9324892737962815 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.34 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.044734 0.06494427 0.04393315 0.06427455 0.0506022 0.07494164 0.09088469 0.05393553 0.10816693 0.06829667] mean value: 0.06647136211395263 key: score_time value: [0.02533603 0.01248574 0.01268339 0.01244736 0.0195601 0.02371478 0.01961422 0.01231694 0.04197431 0.01899767] mean value: 0.01991305351257324 key: test_mcc value: [0.82942474 0.86189955 0.75492611 0.79110556 0.75047877 0.85933785 0.67900461 0.78571429 0.75047877 0.85933785] mean value: 0.7921708093166815 key: train_mcc value: [0.89366043 0.88593277 0.88566582 0.90532508 0.91750062 0.89774912 0.89376313 0.90553988 0.90962508 0.89001213] mean value: 0.8984774052532084 key: test_accuracy value: [0.9122807 0.92982456 0.87719298 0.89473684 0.875 0.92857143 0.83928571 0.89285714 0.875 0.92857143] mean value: 0.8953320802005013 key: train_accuracy value: [0.94674556 0.94280079 0.94280079 0.95266272 0.95866142 0.9488189 0.94685039 0.95275591 0.95472441 0.94488189] mean value: 0.9491702775318765 key: test_fscore value: [0.91525424 0.93103448 0.87719298 0.9 0.87719298 0.92592593 0.84210526 0.89285714 0.87719298 0.92592593] mean value: 0.8964681925282066 key: train_fscore value: [0.94736842 0.94368932 0.94302554 0.95256917 0.95906433 0.94921875 0.94716243 0.95294118 0.95516569 0.94552529] mean value: 0.9495730116083545 key: test_precision value: [0.87096774 0.9 0.89285714 0.87096774 0.86206897 0.96153846 0.82758621 0.89285714 0.86206897 0.96153846] mean value: 0.8902450830593212 key: train_precision value: [0.93822394 0.93103448 0.9375 0.95256917 0.94980695 0.94186047 0.94163424 0.94921875 0.94594595 0.93461538] mean value: 0.9422409327672729 key: test_recall value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.89285714 0.85714286 0.89285714 0.89285714 0.89285714] mean value: 0.9043103448275862 key: train_recall value: [0.95669291 0.95669291 0.9486166 0.95256917 0.96850394 0.95669291 0.95275591 0.95669291 0.96456693 0.95669291] mean value: 0.9570477109333665 key: test_roc_auc value: [0.91317734 0.93041872 0.87746305 0.89408867 0.875 0.92857143 0.83928571 0.89285714 0.875 0.92857143] mean value: 0.8954433497536947 key: train_roc_auc value: [0.9467259 0.94277333 0.94281224 0.95266254 0.95866142 0.9488189 0.94685039 0.95275591 0.95472441 0.94488189] mean value: 0.9491666926021599 key: test_jcc value: [0.84375 0.87096774 0.78125 0.81818182 0.78125 0.86206897 0.72727273 0.80645161 0.78125 0.86206897] mean value: 0.8134511831327738 key: train_jcc value: [0.9 0.89338235 0.89219331 0.90943396 0.92134831 0.90334572 0.89962825 0.91011236 0.9141791 0.89667897] mean value: 0.9040302346875264 MCC on Blind test: 0.21 Accuracy on Blind test: 0.67 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01414633 0.01014614 0.00987864 0.00988293 0.00992298 0.00999713 0.00989652 0.01112843 0.01074934 0.01004362] mean value: 0.010579204559326172 key: score_time value: [0.00973105 0.00881386 0.00866055 0.00865984 0.00862098 0.0086832 0.0088861 0.00939202 0.0096097 0.00943041] mean value: 0.009048771858215333 key: test_mcc value: [0.8953202 0.79161589 0.82512315 0.72133224 0.5728919 0.71428571 0.71611487 0.60753044 0.60753044 0.82195294] mean value: 0.7273697782685168 key: train_mcc value: [0.75148224 0.73599419 0.72785421 0.71616261 0.74805469 0.77167747 0.76772249 0.76772249 0.73248786 0.75592895] mean value: 0.7475087174015772 key: test_accuracy value: [0.94736842 0.89473684 0.9122807 0.85964912 0.78571429 0.85714286 0.85714286 0.80357143 0.80357143 0.91071429] mean value: 0.8631892230576441 key: train_accuracy value: [0.87573964 0.8678501 0.86390533 0.85798817 0.87401575 0.88582677 0.88385827 0.88385827 0.86614173 0.87795276] mean value: 0.8737136778021091 key: test_fscore value: [0.94736842 0.89655172 0.9122807 0.85714286 0.77777778 0.85714286 0.85185185 0.8 0.80701754 0.90909091] mean value: 0.8616224643810851 key: train_fscore value: [0.8762279 0.86626747 0.86282306 0.856 0.87351779 0.88537549 0.88408644 0.88408644 0.86454183 0.87843137] mean value: 0.8731357798405449 key: test_precision value: [0.93103448 0.86666667 0.92857143 0.88888889 0.80769231 0.85714286 0.88461538 0.81481481 0.79310345 0.92592593] mean value: 0.8698456205352757 key: train_precision value: [0.8745098 0.87854251 0.868 0.86639676 0.87698413 0.88888889 0.88235294 0.88235294 0.875 0.875 ] mean value: 0.8768027973402586 key: test_recall value: [0.96428571 0.92857143 0.89655172 0.82758621 0.75 0.85714286 0.82142857 0.78571429 0.82142857 0.89285714] mean value: 0.8545566502463054 key: train_recall value: [0.87795276 0.85433071 0.85770751 0.8458498 0.87007874 0.88188976 0.88582677 0.88582677 0.85433071 0.88188976] mean value: 0.8695683296504932 key: test_roc_auc value: [0.9476601 0.8953202 0.91256158 0.86022167 0.78571429 0.85714286 0.85714286 0.80357143 0.80357143 0.91071429] mean value: 0.8633620689655173 key: train_roc_auc value: [0.87573527 0.86787682 0.86389313 0.85796427 0.87401575 0.88582677 0.88385827 0.88385827 0.86614173 0.87795276] mean value: 0.8737123027605739 key: test_jcc value: [0.9 0.8125 0.83870968 0.75 0.63636364 0.75 0.74193548 0.66666667 0.67647059 0.83333333] mean value: 0.7605979385889253 key: train_jcc value: [0.77972028 0.76408451 0.75874126 0.74825175 0.7754386 0.79432624 0.79225352 0.79225352 0.76140351 0.78321678] mean value: 0.7749689965623754 MCC on Blind test: 0.3 Accuracy on Blind test: 0.73 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01739788 0.0220623 0.01760864 0.02229691 0.01804233 0.0232923 0.02561951 0.01866484 0.0224731 0.01789403] mean value: 0.02053518295288086 key: score_time value: [0.01050401 0.01128793 0.01169872 0.01175117 0.01168776 0.0117321 0.0117178 0.01158166 0.01171303 0.01155519] mean value: 0.011522936820983886 key: test_mcc value: [0.82880708 0.70453109 0.89952865 0.7257422 0.72168784 0.89342711 0.72168784 0.78571429 0.64951905 0.89342711] mean value: 0.7824072264760507 key: train_mcc value: [0.80155032 0.8649269 0.77915876 0.88999604 0.84134934 0.86746041 0.90163769 0.86616858 0.88257403 0.85049917] mean value: 0.8545321231378529 key: test_accuracy value: [0.9122807 0.84210526 0.94736842 0.85964912 0.85714286 0.94642857 0.85714286 0.89285714 0.82142857 0.94642857] mean value: 0.8882832080200501 key: train_accuracy value: [0.89349112 0.93096647 0.88560158 0.94477318 0.91732283 0.93307087 0.9507874 0.93307087 0.94094488 0.92519685] mean value: 0.9255226047927441 key: test_fscore value: [0.90566038 0.81632653 0.95081967 0.87096774 0.84615385 0.94736842 0.86666667 0.89285714 0.83333333 0.94545455] mean value: 0.8875608277555532 key: train_fscore value: [0.8826087 0.92813142 0.89298893 0.94552529 0.91176471 0.9348659 0.95107632 0.93280632 0.94208494 0.92578125] mean value: 0.9247633777608493 key: test_precision value: [0.96 0.95238095 0.90625 0.81818182 0.91666667 0.93103448 0.8125 0.89285714 0.78125 0.96296296] mean value: 0.8934084025808163 key: train_precision value: [0.98543689 0.96995708 0.83737024 0.93103448 0.97747748 0.91044776 0.94552529 0.93650794 0.92424242 0.91860465] mean value: 0.9336604242135553 key: test_recall value: [0.85714286 0.71428571 1. 0.93103448 0.78571429 0.96428571 0.92857143 0.89285714 0.89285714 0.92857143] mean value: 0.8895320197044335 key: train_recall value: [0.7992126 0.88976378 0.95652174 0.96047431 0.85433071 0.96062992 0.95669291 0.92913386 0.96062992 0.93307087] mean value: 0.9200460614359964 key: test_roc_auc value: [0.91133005 0.83990148 0.94642857 0.85837438 0.85714286 0.94642857 0.85714286 0.89285714 0.82142857 0.94642857] mean value: 0.8877463054187192 key: train_roc_auc value: [0.89367745 0.9310479 0.88574118 0.94480408 0.91732283 0.93307087 0.9507874 0.93307087 0.94094488 0.92519685] mean value: 0.925566431172388 key: test_jcc value: [0.82758621 0.68965517 0.90625 0.77142857 0.73333333 0.9 0.76470588 0.80645161 0.71428571 0.89655172] mean value: 0.8010248217752062 key: train_jcc value: [0.78988327 0.86590038 0.80666667 0.89667897 0.83783784 0.87769784 0.90671642 0.87407407 0.89051095 0.86181818] mean value: 0.8607784587352857 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02330208 0.02178788 0.02393198 0.02130818 0.02126288 0.02093101 0.02174282 0.02573943 0.02018857 0.0217464 ] mean value: 0.02219412326812744 key: score_time value: [0.01169729 0.01158571 0.01161098 0.02670074 0.01293182 0.0131402 0.01216984 0.01427531 0.01238012 0.01276994] mean value: 0.013926196098327636 key: test_mcc value: [0.86189955 0.72064772 0.79161589 0.7257422 0.5242106 0.89802651 0.62705445 0.78571429 0.71428571 0.85714286] mean value: 0.7506339773602754 key: train_mcc value: [0.85755494 0.88979006 0.88714258 0.88343206 0.68829839 0.82903225 0.88411257 0.92959505 0.89020543 0.87910492] mean value: 0.8618268250222627 key: test_accuracy value: [0.92982456 0.85964912 0.89473684 0.85964912 0.73214286 0.94642857 0.80357143 0.89285714 0.85714286 0.92857143] mean value: 0.8704573934837093 key: train_accuracy value: [0.9270217 0.94477318 0.94280079 0.9408284 0.82283465 0.91141732 0.94094488 0.96456693 0.94488189 0.93897638] mean value: 0.9279046110360465 key: test_fscore value: [0.93103448 0.85185185 0.89285714 0.87096774 0.65116279 0.94915254 0.82539683 0.89285714 0.85714286 0.92857143] mean value: 0.865099480644191 key: train_fscore value: [0.93032015 0.94552529 0.94093686 0.94252874 0.78571429 0.91651206 0.94296578 0.964 0.944 0.94049904] mean value: 0.9253002206522171 key: test_precision value: [0.9 0.88461538 0.92592593 0.81818182 0.93333333 0.90322581 0.74285714 0.89285714 0.85714286 0.92857143] mean value: 0.8786710839936647 key: train_precision value: [0.89169675 0.93461538 0.97058824 0.91449814 0.9939759 0.86666667 0.91176471 0.9796748 0.95934959 0.917603 ] mean value: 0.9340433174738031 key: test_recall value: [0.96428571 0.82142857 0.86206897 0.93103448 0.5 1. 0.92857143 0.89285714 0.85714286 0.92857143] mean value: 0.8685960591133005 key: train_recall value: [0.97244094 0.95669291 0.91304348 0.97233202 0.6496063 0.97244094 0.97637795 0.9488189 0.92913386 0.96456693] mean value: 0.9255454234228626 key: test_roc_auc value: [0.93041872 0.85899015 0.8953202 0.85837438 0.73214286 0.94642857 0.80357143 0.89285714 0.85714286 0.92857143] mean value: 0.8703817733990148 key: train_roc_auc value: [0.92693193 0.94474962 0.94274221 0.94089042 0.82283465 0.91141732 0.94094488 0.96456693 0.94488189 0.93897638] mean value: 0.9278936229809218 key: test_jcc value: [0.87096774 0.74193548 0.80645161 0.77142857 0.48275862 0.90322581 0.7027027 0.80645161 0.75 0.86666667] mean value: 0.7702588819552112 key: train_jcc value: [0.86971831 0.89667897 0.88846154 0.89130435 0.64705882 0.84589041 0.89208633 0.93050193 0.89393939 0.88768116] mean value: 0.864332121222163 MCC on Blind test: 0.1 Accuracy on Blind test: 0.21 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.20345497 0.19475746 0.18883538 0.18976021 0.18943191 0.18772388 0.19340324 0.19115758 0.19200993 0.20393801] mean value: 0.19344725608825683 key: score_time value: [0.01576757 0.01642036 0.01555204 0.01572323 0.01514602 0.01556444 0.01558781 0.01524043 0.01602459 0.01608181] mean value: 0.015710830688476562 key: test_mcc value: [0.96547546 0.8951918 0.96547546 0.93202124 0.82195294 0.96490128 0.96490128 0.89342711 1. 0.92857143] mean value: 0.9331918004083978 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.94736842 0.98245614 0.96491228 0.91071429 0.98214286 0.98214286 0.94642857 1. 0.96428571] mean value: 0.9662907268170425 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.94545455 0.98305085 0.96666667 0.9122807 0.98181818 0.98245614 0.94736842 1. 0.96428571] mean value: 0.9665199400658812 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96296296 0.96666667 0.93548387 0.89655172 1. 0.96551724 0.93103448 1. 0.96428571] mean value: 0.9622502663158948 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.92857143 1. 1. 0.92857143 0.96428571 1. 0.96428571 1. 0.96428571] mean value: 0.9714285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.94704433 0.98214286 0.96428571 0.91071429 0.98214286 0.98214286 0.94642857 1. 0.96428571] mean value: 0.9661330049261084 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.89655172 0.96666667 0.93548387 0.83870968 0.96428571 0.96551724 0.9 1. 0.93103448] mean value: 0.9362535091901054 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.39 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06323409 0.08535194 0.07566977 0.07401085 0.06657791 0.08859229 0.07154846 0.0926826 0.07722116 0.06817889] mean value: 0.07630679607391358 key: score_time value: [0.01864672 0.04004669 0.0241456 0.03599811 0.02662635 0.0338707 0.04093671 0.02519202 0.03072882 0.02305079] mean value: 0.02992424964904785 key: test_mcc value: [0.96547546 0.8953202 0.8953202 0.93202124 0.82618439 1. 0.96490128 0.89342711 0.93094934 0.92857143] mean value: 0.9232170639899975 key: train_mcc value: [0.99214142 0.99211042 0.98823457 1. 0.98819663 0.98819663 0.98428248 1. 0.99212598 0.98819663] mean value: 0.9913484783283143 key: test_accuracy value: [0.98245614 0.94736842 0.94736842 0.96491228 0.91071429 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.9609962406015038 key: train_accuracy value: [0.99605523 0.99605523 0.99408284 1. 0.99409449 0.99409449 0.99212598 1. 0.99606299 0.99409449] mean value: 0.9956665734830483 key: test_fscore value: [0.98181818 0.94736842 0.94736842 0.96666667 0.91525424 1. 0.98245614 0.94736842 0.96296296 0.96428571] mean value: 0.9615549166530434 key: train_fscore value: [0.99604743 0.99606299 0.99403579 1. 0.99408284 0.99408284 0.99209486 1. 0.99606299 0.99410609] mean value: 0.9956575832877012 key: test_precision value: [1. 0.93103448 0.96428571 0.93548387 0.87096774 1. 0.96551724 0.93103448 1. 0.96428571] mean value: 0.9562609248371206 key: train_precision value: [1. 0.99606299 1. 1. 0.99604743 0.99604743 0.99603175 1. 0.99606299 0.99215686] mean value: 0.9972409454688892 key: test_recall value: [0.96428571 0.96428571 0.93103448 1. 0.96428571 1. 1. 0.96428571 0.92857143 0.96428571] mean value: 0.968103448275862 key: train_recall value: [0.99212598 0.99606299 0.98814229 1. 0.99212598 0.99212598 0.98818898 1. 0.99606299 0.99606299] mean value: 0.994089819800193 key: test_roc_auc value: [0.98214286 0.9476601 0.9476601 0.96428571 0.91071429 1. 0.98214286 0.94642857 0.96428571 0.96428571] mean value: 0.960960591133005 key: train_roc_auc value: [0.99606299 0.99605521 0.99407115 1. 0.99409449 0.99409449 0.99212598 1. 0.99606299 0.99409449] mean value: 0.9956661790793937 key: test_jcc value: [0.96428571 0.9 0.9 0.93548387 0.84375 1. 0.96551724 0.9 0.92857143 0.93103448] mean value: 0.9268642737962816 key: train_jcc value: [0.99212598 0.99215686 0.98814229 1. 0.98823529 0.98823529 0.98431373 1. 0.99215686 0.98828125] mean value: 0.9913647565957774 MCC on Blind test: 0.09 Accuracy on Blind test: 0.32 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.16241002 0.13786769 0.23004556 0.15227127 0.16346526 0.1342628 0.15097213 0.17534328 0.17445135 0.1752553 ] mean value: 0.16563446521759034 key: score_time value: [0.02536011 0.02174473 0.02583504 0.01519656 0.02511072 0.01517773 0.02864337 0.02516484 0.02513885 0.02539349] mean value: 0.023276543617248534 key: test_mcc value: [0.85960591 0.54592083 0.65104858 0.61405719 0.3992747 0.71428571 0.53605627 0.60753044 0.64285714 0.85714286] mean value: 0.6427779638491756 key: train_mcc value: [0.98823511 0.98434388 0.98434291 0.98823457 0.98428248 0.98437404 0.98437404 0.98437404 0.98428248 0.98437404] mean value: 0.9851217587100868 key: test_accuracy value: [0.92982456 0.77192982 0.8245614 0.80701754 0.69642857 0.85714286 0.76785714 0.80357143 0.82142857 0.92857143] mean value: 0.8208333333333333 key: train_accuracy value: [0.99408284 0.99211045 0.99211045 0.99408284 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598] mean value: 0.9925142493283015 key: test_fscore value: [0.92857143 0.75471698 0.82142857 0.81355932 0.66666667 0.85714286 0.76363636 0.8 0.82142857 0.92857143] mean value: 0.8155722190611862 key: train_fscore value: [0.99405941 0.99206349 0.99203187 0.99403579 0.99209486 0.99206349 0.99206349 0.99206349 0.99209486 0.99206349] mean value: 0.9924634247376443 key: test_precision value: [0.92857143 0.8 0.85185185 0.8 0.73913043 0.85714286 0.77777778 0.81481481 0.82142857 0.92857143] mean value: 0.8319289164941339 key: train_precision value: [1. 1. 1. 1. 0.99603175 1. 1. 1. 0.99603175 1. ] mean value: 0.9992063492063492 key: test_recall value: [0.92857143 0.71428571 0.79310345 0.82758621 0.60714286 0.85714286 0.75 0.78571429 0.82142857 0.92857143] mean value: 0.8013546798029556 key: train_recall value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197 0.98425197 0.98425197 0.98818898 0.98425197] mean value: 0.985815878746382 key: test_roc_auc value: [0.92980296 0.77093596 0.82512315 0.80665025 0.69642857 0.85714286 0.76785714 0.80357143 0.82142857 0.92857143] mean value: 0.8207512315270936 key: train_roc_auc value: [0.99409449 0.99212598 0.99209486 0.99407115 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598] mean value: 0.9925142385857895 key: test_jcc value: [0.86666667 0.60606061 0.6969697 0.68571429 0.5 0.75 0.61764706 0.66666667 0.6969697 0.86666667] mean value: 0.6953361344537815 key: train_jcc value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98431373 0.98425197 0.98425197 0.98425197 0.98431373 0.98425197] mean value: 0.9850408285688307 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.7869184 0.77875566 0.78472948 0.7729466 0.76530933 0.76920581 0.77312517 0.7658267 0.76497579 0.7665813 ] mean value: 0.7728374242782593 key: score_time value: [0.00956082 0.010185 0.01012588 0.00918317 0.00925159 0.00918603 0.00912333 0.00915837 0.00957084 0.00959945] mean value: 0.009494447708129882 key: test_mcc value: [0.96547546 0.92980296 0.92980296 0.93202124 0.82195294 1. 0.92857143 0.89342711 0.96490128 0.92857143] mean value: 0.9294526803513998 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.96491228 0.96491228 0.96491228 0.91071429 1. 0.96428571 0.94642857 0.98214286 0.96428571] mean value: 0.9645050125313284 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.96428571 0.96551724 0.96666667 0.9122807 1. 0.96428571 0.94736842 0.98181818 0.96428571] mean value: 0.9648326537346501 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96428571 0.96551724 0.93548387 0.89655172 1. 0.96428571 0.93103448 1. 0.96428571] mean value: 0.9621444462100747 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.96551724 1. 0.92857143 1. 0.96428571 0.96428571 0.96428571 0.96428571] mean value: 0.9679802955665024 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.96490148 0.96490148 0.96428571 0.91071429 1. 0.96428571 0.94642857 0.98214286 0.96428571] mean value: 0.964408866995074 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.93103448 0.93333333 0.93548387 0.83870968 1. 0.93103448 0.9 0.96428571 0.93103448] mean value: 0.9329201758567721 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.29 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03118086 0.03124666 0.03096771 0.03132343 0.03108811 0.03085017 0.03105617 0.03075123 0.03121066 0.03153753] mean value: 0.031121253967285156 key: score_time value: [0.01248264 0.01258159 0.01248908 0.01303363 0.01314592 0.01312351 0.01312232 0.01303387 0.01307392 0.0132153 ] mean value: 0.012930178642272949 key: test_mcc value: [ 0.30265542 -0.06746787 0.35337918 0.15195767 -0.06262243 0.05399492 0.11547005 0.05399492 0.57735027 0.18650096] mean value: 0.16652131068133877 key: train_mcc value: [0.56403512 0.41093503 0.5405667 0.41258679 0.36596253 0.32302914 0.31554255 0.4796084 0.91257312 0.33769082] mean value: 0.4662530185948502 key: test_accuracy value: [0.61403509 0.47368421 0.64912281 0.56140351 0.48214286 0.51785714 0.53571429 0.51785714 0.78571429 0.57142857] mean value: 0.5708959899749373 key: train_accuracy value: [0.74161736 0.64497041 0.72583826 0.64497041 0.61811024 0.59448819 0.59055118 0.68700787 0.95472441 0.6023622 ] mean value: 0.6804640544192331 key: test_fscore value: [0.7027027 0.625 0.72972973 0.67532468 0.63291139 0.64935065 0.66666667 0.64935065 0.8 0.67567568] mean value: 0.6806712141205812 key: train_fscore value: [0.79499218 0.73837209 0.78449612 0.73760933 0.72364672 0.71148459 0.70949721 0.76161919 0.95652174 0.71549296] mean value: 0.7633732133244073 key: test_precision value: [0.56521739 0.48076923 0.6 0.54166667 0.49019608 0.51020408 0.52 0.51020408 0.75 0.54347826] mean value: 0.5511735791306489 key: train_precision value: [0.65974026 0.58525346 0.64540816 0.58429561 0.56696429 0.55217391 0.54978355 0.61501211 0.92 0.55701754] mean value: 0.6235648890174496 key: test_recall value: [0.92857143 0.89285714 0.93103448 0.89655172 0.89285714 0.89285714 0.92857143 0.89285714 0.85714286 0.89285714] mean value: 0.9006157635467981 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99606299 1. ] mean value: 0.9996062992125985 key: test_roc_auc value: [0.61945813 0.48091133 0.64408867 0.55541872 0.48214286 0.51785714 0.53571429 0.51785714 0.78571429 0.57142857] mean value: 0.5710591133004926 key: train_roc_auc value: [0.74110672 0.64426877 0.72637795 0.64566929 0.61811024 0.59448819 0.59055118 0.68700787 0.95472441 0.6023622 ] mean value: 0.6804666832653824 key: test_jcc value: [0.54166667 0.45454545 0.57446809 0.50980392 0.46296296 0.48076923 0.5 0.48076923 0.66666667 0.51020408] mean value: 0.5181856300687876 key: train_jcc value: [0.65974026 0.58525346 0.64540816 0.58429561 0.56696429 0.55217391 0.54978355 0.61501211 0.91666667 0.55701754] mean value: 0.6232315556841161 MCC on Blind test: -0.06 Accuracy on Blind test: 0.18 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02687025 0.03934693 0.03566217 0.03851652 0.04220271 0.03784752 0.03835869 0.03784919 0.03812003 0.03818917] mean value: 0.03729631900787354 key: score_time value: [0.01908278 0.01851916 0.02274084 0.01836586 0.01849699 0.01842642 0.01837707 0.01831055 0.01833153 0.01832366] mean value: 0.018897485733032227 key: test_mcc value: [0.92980296 0.8953202 0.82512315 0.82490815 0.75434227 0.82195294 0.67900461 0.78571429 0.71611487 0.82618439] mean value: 0.8058467820455303 key: train_mcc value: [0.86198955 0.86998617 0.86194018 0.86999628 0.88599845 0.85850727 0.86624915 0.878014 0.87040934 0.86237183] mean value: 0.8685462219654635 key: test_accuracy value: [0.96491228 0.94736842 0.9122807 0.9122807 0.875 0.91071429 0.83928571 0.89285714 0.85714286 0.91071429] mean value: 0.9022556390977443 key: train_accuracy value: [0.93096647 0.93491124 0.93096647 0.93491124 0.94291339 0.92913386 0.93307087 0.93897638 0.93503937 0.93110236] mean value: 0.9341991644535558 key: test_fscore value: [0.96428571 0.94736842 0.9122807 0.91525424 0.88135593 0.90909091 0.84210526 0.89285714 0.86206897 0.90566038] mean value: 0.9032327664565936 key: train_fscore value: [0.93150685 0.93567251 0.93096647 0.93542074 0.94346979 0.92996109 0.93359375 0.93933464 0.93592233 0.93177388] mean value: 0.9347622049276256 key: test_precision value: [0.96428571 0.93103448 0.92857143 0.9 0.83870968 0.92592593 0.82758621 0.89285714 0.83333333 0.96 ] mean value: 0.9002303912048072 key: train_precision value: [0.92607004 0.92664093 0.92913386 0.92635659 0.93436293 0.91923077 0.92635659 0.93385214 0.92337165 0.92277992] mean value: 0.9268155416074748 key: test_recall value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.89285714 0.85714286 0.89285714 0.89285714 0.85714286] mean value: 0.9077586206896552 key: train_recall value: [0.93700787 0.94488189 0.93280632 0.94466403 0.95275591 0.94094488 0.94094488 0.94488189 0.9488189 0.94094488] mean value: 0.942865145809343 key: test_roc_auc value: [0.96490148 0.9476601 0.91256158 0.91194581 0.875 0.91071429 0.83928571 0.89285714 0.85714286 0.91071429] mean value: 0.9022783251231528 key: train_roc_auc value: [0.93095453 0.93489154 0.93097009 0.93493044 0.94291339 0.92913386 0.93307087 0.93897638 0.93503937 0.93110236] mean value: 0.9341982820329278 key: test_jcc value: [0.93103448 0.9 0.83870968 0.84375 0.78787879 0.83333333 0.72727273 0.80645161 0.75757576 0.82758621] mean value: 0.8253592586038359 key: train_jcc value: [0.87179487 0.87912088 0.87084871 0.87867647 0.89298893 0.86909091 0.87545788 0.88560886 0.87956204 0.87226277] mean value: 0.8775412318035963 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.27225637 0.29213047 0.28225875 0.29532313 0.28092909 0.27798748 0.30030775 0.27729392 0.29576445 0.31901908] mean value: 0.2893270492553711 key: score_time value: [0.01860166 0.01854444 0.01844335 0.01861453 0.01857495 0.01867414 0.0185442 0.01859784 0.01850152 0.01851773] mean value: 0.018561434745788575 key: test_mcc value: [0.92980296 0.8953202 0.85960591 0.82490815 0.75434227 0.82195294 0.67900461 0.78571429 0.71611487 0.82618439] mean value: 0.8092950579075993 key: train_mcc value: [0.86198955 0.86998617 0.88168563 0.86999628 0.90174953 0.85850727 0.86624915 0.878014 0.87040934 0.86237183] mean value: 0.8720958746760563 key: test_accuracy value: [0.96491228 0.94736842 0.92982456 0.9122807 0.875 0.91071429 0.83928571 0.89285714 0.85714286 0.91071429] mean value: 0.9040100250626566 key: train_accuracy value: [0.93096647 0.93491124 0.9408284 0.93491124 0.9507874 0.92913386 0.93307087 0.93897638 0.93503937 0.93110236] mean value: 0.9359727593222444 key: test_fscore value: [0.96428571 0.94736842 0.93103448 0.91525424 0.88135593 0.90909091 0.84210526 0.89285714 0.86206897 0.90566038] mean value: 0.905108144557017 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:155: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:158: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93150685 0.93567251 0.94094488 0.93542074 0.95126706 0.92996109 0.93359375 0.93933464 0.93592233 0.93177388] mean value: 0.9365397732693178 key: test_precision value: [0.96428571 0.93103448 0.93103448 0.9 0.83870968 0.92592593 0.82758621 0.89285714 0.83333333 0.96 ] mean value: 0.9004766966235265 key: train_precision value: [0.92607004 0.92664093 0.9372549 0.92635659 0.94208494 0.91923077 0.92635659 0.93385214 0.92337165 0.92277992] mean value: 0.9283998467489825 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 0.89285714 0.85714286 0.89285714 0.89285714 0.85714286] mean value: 0.9112068965517242 key: train_recall value: [0.93700787 0.94488189 0.94466403 0.94466403 0.96062992 0.94094488 0.94094488 0.94488189 0.9488189 0.94094488] mean value: 0.9448383181351343 key: test_roc_auc value: [0.96490148 0.9476601 0.92980296 0.91194581 0.875 0.91071429 0.83928571 0.89285714 0.85714286 0.91071429] mean value: 0.9040024630541872 key: train_roc_auc value: [0.93095453 0.93489154 0.94083595 0.93493044 0.9507874 0.92913386 0.93307087 0.93897638 0.93503937 0.93110236] mean value: 0.9359722697706265 key: test_jcc value: [0.93103448 0.9 0.87096774 0.84375 0.78787879 0.83333333 0.72727273 0.80645161 0.75757576 0.82758621] mean value: 0.8285850650554488 key: train_jcc value: [0.87179487 0.87912088 0.88847584 0.87867647 0.9070632 0.86909091 0.87545788 0.88560886 0.87956204 0.87226277] mean value: 0.8807113713116829 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03605223 0.03439736 0.03540349 0.03663993 0.03655434 0.03590679 0.03478384 0.03613806 0.03570008 0.03664112] mean value: 0.035821723937988284 key: score_time value: [0.01276493 0.01178789 0.01285243 0.01279426 0.01268625 0.01275706 0.0128274 0.01273704 0.01279736 0.01278663] mean value: 0.012679123878479004 key: test_mcc value: [0.74935731 0.75033796 0.63745526 0.81878307 0.96423926 0.89139151 0.81878307 0.96423926 0.75724019 0.81878307] mean value: 0.8170609946666495 key: train_mcc value: [0.86672653 0.87081606 0.88732456 0.87070654 0.85465174 0.862886 0.86274286 0.85865182 0.87881806 0.86667428] mean value: 0.8679998468826917 key: test_accuracy value: [0.87272727 0.87272727 0.81818182 0.90909091 0.98181818 0.94545455 0.90909091 0.98181818 0.87272727 0.90909091] mean value: 0.9072727272727272 key: train_accuracy value: [0.93333333 0.93535354 0.94343434 0.93535354 0.92727273 0.93131313 0.93131313 0.92929293 0.93939394 0.93333333] mean value: 0.933939393939394 key: test_fscore value: [0.8627451 0.87719298 0.80769231 0.90909091 0.98113208 0.94736842 0.90909091 0.98245614 0.8852459 0.90909091] mean value: 0.9071105653974942 key: train_fscore value: [0.93386774 0.936 0.94444444 0.93548387 0.928 0.932 0.93172691 0.92957746 0.93951613 0.93333333] mean value: 0.9343949885667974 key: test_precision value: [0.91666667 0.83333333 0.84 0.89285714 1. 0.93103448 0.92592593 0.96551724 0.81818182 0.92592593] mean value: 0.9049442537028745 key: train_precision value: [0.92828685 0.92857143 0.9296875 0.93548387 0.92063492 0.92094862 0.92430279 0.924 0.93574297 0.93145161] mean value: 0.927911056299992 key: test_recall value: [0.81481481 0.92592593 0.77777778 0.92592593 0.96296296 0.96428571 0.89285714 1. 0.96428571 0.89285714] mean value: 0.9121693121693122 key: train_recall value: [0.93951613 0.94354839 0.95967742 0.93548387 0.93548387 0.94331984 0.93927126 0.93522267 0.94331984 0.93522267] mean value: 0.9410065952722999 key: test_roc_auc value: [0.87169312 0.87367725 0.81746032 0.90939153 0.98148148 0.94510582 0.90939153 0.98148148 0.87103175 0.90939153] mean value: 0.907010582010582 key: train_roc_auc value: [0.93332082 0.93533695 0.94340146 0.93535327 0.92725611 0.93133734 0.93132918 0.92930488 0.93940185 0.93333714] mean value: 0.9339378999608202 key: test_jcc value: [0.75862069 0.78125 0.67741935 0.83333333 0.96296296 0.9 0.83333333 0.96551724 0.79411765 0.83333333] mean value: 0.8339887895894978 key: train_jcc value: [0.87593985 0.87969925 0.89473684 0.87878788 0.86567164 0.87265918 0.87218045 0.86842105 0.88593156 0.875 ] mean value: 0.876902769915327 MCC on Blind test: 0.28 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.83404994 0.86711574 0.86436319 0.86811352 0.81349635 0.76629972 0.91054893 0.83312821 0.85565472 0.7771697 ] mean value: 0.8389940023422241 key: score_time value: [0.01317811 0.01305819 0.01305723 0.01334643 0.01304197 0.01310301 0.01307392 0.01307011 0.0126493 0.0120368 ] mean value: 0.012961506843566895 key: test_mcc value: [0.74935731 0.79069197 0.67602163 0.81878307 0.92962225 0.74569602 0.85449735 0.96423926 0.75724019 0.78410665] mean value: 0.8070255698567865 key: train_mcc value: [0.90329065 0.89091503 0.91922028 0.89905801 0.94355919 0.95574863 0.898996 0.89091503 0.84242919 0.82225124] mean value: 0.8966383251997698 key: test_accuracy value: [0.87272727 0.89090909 0.83636364 0.90909091 0.96363636 0.87272727 0.92727273 0.98181818 0.87272727 0.89090909] mean value: 0.9018181818181817 key: train_accuracy value: [0.95151515 0.94545455 0.95959596 0.94949495 0.97171717 0.97777778 0.94949495 0.94545455 0.92121212 0.91111111] mean value: 0.9482828282828283 key: test_fscore value: [0.8627451 0.89655172 0.82352941 0.90909091 0.96153846 0.87719298 0.92857143 0.98245614 0.8852459 0.88888889] mean value: 0.9015810946477902 key: train_fscore value: [0.95219124 0.94567404 0.95983936 0.94929006 0.97154472 0.97750511 0.94929006 0.94523327 0.92089249 0.91129032] mean value: 0.9482750669610251 key: test_precision value: [0.91666667 0.83870968 0.875 0.89285714 1. 0.86206897 0.92857143 0.96551724 0.81818182 0.92307692] mean value: 0.9020649863669886 key: train_precision value: [0.94094488 0.9437751 0.956 0.95510204 0.9795082 0.98760331 0.95121951 0.94715447 0.92276423 0.90763052] mean value: 0.9491702259084599 key: test_recall value: [0.81481481 0.96296296 0.77777778 0.92592593 0.92592593 0.89285714 0.92857143 1. 0.96428571 0.85714286] mean value: 0.9050264550264551 key: train_recall value: [0.96370968 0.94758065 0.96370968 0.94354839 0.96370968 0.96761134 0.94736842 0.94331984 0.91902834 0.91497976] mean value: 0.9474565756823822 key: test_roc_auc value: [0.87169312 0.89219577 0.83531746 0.90939153 0.96296296 0.8723545 0.92724868 0.98148148 0.87103175 0.89153439] mean value: 0.901521164021164 key: train_roc_auc value: [0.95149047 0.94545024 0.95958763 0.94950699 0.97173338 0.97775728 0.94949066 0.94545024 0.92120772 0.91111891] mean value: 0.9482793522267207 key: test_jcc value: [0.75862069 0.8125 0.7 0.83333333 0.92592593 0.78125 0.86666667 0.96551724 0.79411765 0.8 ] mean value: 0.8237931504019232 key: train_jcc value: [0.90874525 0.89694656 0.92277992 0.9034749 0.94466403 0.956 0.9034749 0.89615385 0.85338346 0.83703704] mean value: 0.9022659915221568 MCC on Blind test: 0.26 Accuracy on Blind test: 0.65 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01397133 0.01245117 0.01030517 0.00996041 0.01001978 0.00983381 0.00986528 0.00995183 0.00990844 0.01001573] mean value: 0.010628294944763184 key: score_time value: [0.01229095 0.0092237 0.00907922 0.00885463 0.00884295 0.00878549 0.00874043 0.00866818 0.00869727 0.00873613] mean value: 0.00919189453125 key: test_mcc value: [0.49137176 0.63624339 0.56841568 0.63745526 0.80032673 0.57068493 0.75878131 0.79069197 0.52777778 0.68504815] mean value: 0.646679695350377 key: train_mcc value: [0.69712309 0.67315631 0.74497319 0.66701918 0.66780236 0.6884516 0.6712994 0.65867563 0.6757794 0.66760924] mean value: 0.6811889391928496 key: test_accuracy value: [0.74545455 0.81818182 0.78181818 0.81818182 0.89090909 0.78181818 0.87272727 0.89090909 0.76363636 0.83636364] mean value: 0.8200000000000001 key: train_accuracy value: [0.84848485 0.83232323 0.87070707 0.83030303 0.83030303 0.84040404 0.83232323 0.82626263 0.83434343 0.83030303] mean value: 0.8375757575757576 key: test_fscore value: [0.73076923 0.81481481 0.76 0.80769231 0.875 0.76923077 0.8627451 0.88461538 0.76363636 0.82352941] mean value: 0.8092033380562792 key: train_fscore value: [0.84725051 0.81838074 0.86440678 0.81818182 0.8173913 0.82713348 0.81917211 0.81304348 0.8209607 0.81659389] mean value: 0.8262514811253847 key: test_precision value: [0.76 0.81481481 0.82608696 0.84 1. 0.83333333 0.95652174 0.95833333 0.77777778 0.91304348] mean value: 0.8679911433172303 key: train_precision value: [0.85596708 0.89473684 0.91071429 0.88317757 0.88679245 0.9 0.88679245 0.87793427 0.89099526 0.88625592] mean value: 0.8873366138897277 key: test_recall value: [0.7037037 0.81481481 0.7037037 0.77777778 0.77777778 0.71428571 0.78571429 0.82142857 0.75 0.75 ] mean value: 0.7599206349206349 key: train_recall value: [0.83870968 0.75403226 0.82258065 0.76209677 0.75806452 0.76518219 0.7611336 0.75708502 0.7611336 0.75708502] mean value: 0.7737103304166123 key: test_roc_auc value: [0.74470899 0.81812169 0.78042328 0.81746032 0.88888889 0.78306878 0.87433862 0.89219577 0.76388889 0.83796296] mean value: 0.8201058201058201 key: train_roc_auc value: [0.84850464 0.83248172 0.87080449 0.8304411 0.83044926 0.84025238 0.8321797 0.82612316 0.83419583 0.83015541] mean value: 0.837558769753167 key: test_jcc value: [0.57575758 0.6875 0.61290323 0.67741935 0.77777778 0.625 0.75862069 0.79310345 0.61764706 0.7 ] mean value: 0.6825729130935079 key: train_jcc value: [0.73498233 0.69259259 0.76119403 0.69230769 0.69117647 0.70522388 0.69372694 0.68498168 0.6962963 0.6900369 ] mean value: 0.7042518817008117 MCC on Blind test: 0.31 Accuracy on Blind test: 0.73 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01024437 0.0100975 0.01008296 0.01021576 0.01011634 0.01013732 0.01008844 0.01011109 0.01027632 0.01016641] mean value: 0.010153651237487793 key: score_time value: [0.00875211 0.00881267 0.00876999 0.00872159 0.0086987 0.00877166 0.00879073 0.00876713 0.0087781 0.00872397] mean value: 0.00875866413116455 key: test_mcc value: [0.56841568 0.65330526 0.60000053 0.75033796 0.89139151 0.63841116 0.78174603 0.75878131 0.52935027 0.6005291 ] mean value: 0.6772268811002418 key: train_mcc value: [0.7134319 0.76975822 0.75369821 0.71362312 0.74951431 0.76567678 0.77794469 0.71736756 0.70631188 0.72166787] mean value: 0.738899455109274 key: test_accuracy value: [0.78181818 0.81818182 0.8 0.87272727 0.94545455 0.81818182 0.89090909 0.87272727 0.76363636 0.8 ] mean value: 0.8363636363636364 key: train_accuracy value: [0.85656566 0.88484848 0.87676768 0.85656566 0.87474747 0.88282828 0.88888889 0.85858586 0.85252525 0.86060606] mean value: 0.8692929292929292 key: test_fscore value: [0.76 0.83333333 0.79245283 0.87719298 0.94339623 0.81481481 0.89285714 0.8627451 0.77966102 0.8 ] mean value: 0.8356453445053573 key: train_fscore value: [0.85480573 0.88438134 0.87576375 0.85420945 0.87550201 0.88211382 0.88977956 0.85655738 0.84759916 0.85773196] mean value: 0.8678444146780728 key: test_precision value: [0.82608696 0.75757576 0.80769231 0.83333333 0.96153846 0.84615385 0.89285714 0.95652174 0.74193548 0.81481481] mean value: 0.8438509843488806 key: train_precision value: [0.86721992 0.88979592 0.88477366 0.87029289 0.872 0.88571429 0.88095238 0.86721992 0.875 0.87394958] mean value: 0.8766918548471572 key: test_recall value: [0.7037037 0.92592593 0.77777778 0.92592593 0.92592593 0.78571429 0.89285714 0.78571429 0.82142857 0.78571429] mean value: 0.8330687830687831 key: train_recall value: [0.84274194 0.87903226 0.86693548 0.83870968 0.87903226 0.87854251 0.89878543 0.84615385 0.82186235 0.84210526] mean value: 0.8593901005615776 key: test_roc_auc value: [0.78042328 0.82010582 0.79960317 0.87367725 0.94510582 0.81878307 0.89087302 0.87433862 0.76256614 0.80026455] mean value: 0.8365740740740741 key: train_roc_auc value: [0.85659364 0.88486026 0.87678758 0.8566018 0.8747388 0.88281964 0.88890884 0.85856079 0.85246343 0.86056876] mean value: 0.869290355230508 key: test_jcc value: [0.61290323 0.71428571 0.65625 0.78125 0.89285714 0.6875 0.80645161 0.75862069 0.63888889 0.66666667] mean value: 0.7215673941063262 key: train_jcc value: [0.74642857 0.79272727 0.77898551 0.74551971 0.77857143 0.78909091 0.80144404 0.74910394 0.73550725 0.75090253] mean value: 0.766828116175246 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00968528 0.01099563 0.01088142 0.01142859 0.01120138 0.01143241 0.01128864 0.01136971 0.01154232 0.01143885] mean value: 0.011126422882080078 key: score_time value: [0.01223397 0.01313019 0.01311469 0.01409554 0.01288962 0.01380253 0.01319766 0.01341391 0.01384592 0.01357269] mean value: 0.013329672813415527 key: test_mcc value: [0.50088476 0.45502646 0.30952381 0.52715278 0.56441351 0.28210909 0.68504815 0.68504815 0.42602426 0.53758181] mean value: 0.4972812768008551 key: train_mcc value: [0.69697662 0.69314677 0.68904093 0.64067939 0.6782269 0.7057203 0.69371399 0.66901351 0.75103262 0.68538123] mean value: 0.6902932259968104 key: test_accuracy value: [0.74545455 0.72727273 0.65454545 0.76363636 0.78181818 0.63636364 0.83636364 0.83636364 0.70909091 0.76363636] mean value: 0.7454545454545455 key: train_accuracy value: [0.84848485 0.84646465 0.84444444 0.82020202 0.83838384 0.85252525 0.84646465 0.83434343 0.87474747 0.84242424] mean value: 0.8448484848484848 key: test_fscore value: [0.70833333 0.72727273 0.65454545 0.75471698 0.76923077 0.6 0.82352941 0.82352941 0.74193548 0.74509804] mean value: 0.7348191612130426 key: train_fscore value: [0.84848485 0.84489796 0.84317719 0.81799591 0.83333333 0.84886128 0.84232365 0.83127572 0.87029289 0.83884298] mean value: 0.8419485757928358 key: test_precision value: [0.80952381 0.71428571 0.64285714 0.76923077 0.8 0.68181818 0.91304348 0.91304348 0.67647059 0.82608696] mean value: 0.774636011899439 key: train_precision value: [0.85020243 0.8553719 0.85185185 0.82987552 0.86206897 0.86864407 0.86382979 0.84518828 0.9004329 0.85654008] mean value: 0.8584005790388104 key: test_recall value: [0.62962963 0.74074074 0.66666667 0.74074074 0.74074074 0.53571429 0.75 0.75 0.82142857 0.67857143] mean value: 0.7054232804232804 key: train_recall value: [0.84677419 0.83467742 0.83467742 0.80645161 0.80645161 0.82995951 0.82186235 0.81781377 0.84210526 0.82186235] mean value: 0.8262635496930912 key: test_roc_auc value: [0.74338624 0.72751323 0.6547619 0.76322751 0.78108466 0.63822751 0.83796296 0.83796296 0.70701058 0.76521164] mean value: 0.7456349206349207 key: train_roc_auc value: [0.84848831 0.84648851 0.84446422 0.82022986 0.83844848 0.85247976 0.84641505 0.83431011 0.87468166 0.84238279] mean value: 0.8448388729267338 key: test_jcc value: [0.5483871 0.57142857 0.48648649 0.60606061 0.625 0.42857143 0.7 0.7 0.58974359 0.59375 ] mean value: 0.5849427779064875 key: train_jcc value: [0.73684211 0.73144876 0.72887324 0.69204152 0.71428571 0.73741007 0.72759857 0.71126761 0.77037037 0.72241993] mean value: 0.7272557887808211 MCC on Blind test: 0.26 Accuracy on Blind test: 0.68 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02347016 0.02215576 0.0214541 0.02343726 0.021842 0.02241516 0.02220798 0.02189755 0.0212822 0.02229095] mean value: 0.022245311737060548 key: score_time value: [0.01336884 0.01203799 0.01242447 0.01285195 0.01233602 0.01203322 0.01214242 0.0121491 0.0117538 0.01247334] mean value: 0.012357115745544434 key: test_mcc value: [0.60876172 0.78410665 0.60000053 0.85695439 0.92962225 0.81854376 0.81878307 0.92724868 0.82269299 0.81878307] mean value: 0.7985497110226856 key: train_mcc value: [0.81823674 0.8101084 0.8222215 0.79800833 0.78586238 0.79800173 0.79797897 0.79394672 0.80207347 0.79797897] mean value: 0.8024417204598691 key: test_accuracy value: [0.8 0.89090909 0.8 0.92727273 0.96363636 0.90909091 0.90909091 0.96363636 0.90909091 0.90909091] mean value: 0.8981818181818182 key: train_accuracy value: [0.90909091 0.90505051 0.91111111 0.8989899 0.89292929 0.8989899 0.8989899 0.8969697 0.9010101 0.8989899 ] mean value: 0.9012121212121212 key: test_fscore value: [0.7755102 0.89285714 0.79245283 0.92857143 0.96153846 0.9122807 0.90909091 0.96428571 0.91525424 0.90909091] mean value: 0.8960932538747399 key: train_fscore value: [0.90981964 0.90505051 0.91129032 0.89878543 0.89336016 0.89837398 0.89878543 0.8969697 0.90020367 0.89878543] mean value: 0.9011424249876461 key: test_precision value: [0.86363636 0.86206897 0.80769231 0.89655172 1. 0.89655172 0.92592593 0.96428571 0.87096774 0.92592593] mean value: 0.9013606393194825 key: train_precision value: [0.90438247 0.90688259 0.91129032 0.90243902 0.89156627 0.90204082 0.89878543 0.89516129 0.9057377 0.89878543] mean value: 0.9017071335013342 key: test_recall value: [0.7037037 0.92592593 0.77777778 0.96296296 0.92592593 0.92857143 0.89285714 0.96428571 0.96428571 0.89285714] mean value: 0.8939153439153439 key: train_recall value: [0.91532258 0.90322581 0.91129032 0.89516129 0.89516129 0.89473684 0.89878543 0.89878543 0.89473684 0.89878543] mean value: 0.9005991249836751 key: test_roc_auc value: [0.79828042 0.89153439 0.79960317 0.92791005 0.96296296 0.90873016 0.90939153 0.96362434 0.90806878 0.90939153] mean value: 0.8979497354497354 key: train_roc_auc value: [0.90907829 0.9050542 0.91111075 0.89899765 0.89292477 0.89898132 0.89898949 0.89697336 0.90099745 0.89898949] mean value: 0.9012096774193549 key: test_jcc value: [0.63333333 0.80645161 0.65625 0.86666667 0.92592593 0.83870968 0.83333333 0.93103448 0.84375 0.83333333] mean value: 0.8168788365673794 key: train_jcc value: [0.83455882 0.82656827 0.83703704 0.81617647 0.80727273 0.81549815 0.81617647 0.81318681 0.81851852 0.81617647] mean value: 0.8201169751973421 MCC on Blind test: 0.25 Accuracy on Blind test: 0.72 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.93131471 1.97601247 1.99174571 2.14477992 2.12438631 1.95171905 1.95633554 2.3193562 1.9155972 2.00993013] mean value: 2.03211772441864 key: score_time value: [0.01248431 0.01242208 0.01508594 0.01516485 0.01528311 0.01378775 0.01732278 0.01664519 0.01326823 0.01835799] mean value: 0.014982223510742188 key: test_mcc value: [0.74935731 0.75878131 0.56841568 0.81878307 0.92962225 0.78174603 0.82337971 0.92724868 0.86334835 0.81878307] mean value: 0.8039465463217836 key: train_mcc value: [0.99195142 0.9878869 0.99596768 0.99195168 1. 0.99195142 0.99596768 0.9878869 0.99596768 1. ] mean value: 0.9939531339647318 key: test_accuracy value: [0.87272727 0.87272727 0.78181818 0.90909091 0.96363636 0.89090909 0.90909091 0.96363636 0.92727273 0.90909091] mean value: 0.9 key: train_accuracy value: [0.9959596 0.99393939 0.9979798 0.9959596 1. 0.9959596 0.9979798 0.99393939 0.9979798 1. ] mean value: 0.996969696969697 key: test_fscore value: [0.8627451 0.88135593 0.76 0.90909091 0.96153846 0.89285714 0.90566038 0.96428571 0.93333333 0.90909091] mean value: 0.8979957877797566 key: train_fscore value: [0.99598394 0.99393939 0.99798793 0.99595142 1. 0.99593496 0.9979716 0.99393939 0.9979716 1. ] mean value: 0.9969680232408948 key: test_precision value: [0.91666667 0.8125 0.82608696 0.89285714 1. 0.89285714 0.96 0.96428571 0.875 0.92592593] mean value: 0.9066179549114332 key: train_precision value: [0.992 0.99595142 0.99598394 1. 1. 1. 1. 0.99193548 1. 1. ] mean value: 0.9975870836617988 key: test_recall value: [0.81481481 0.96296296 0.7037037 0.92592593 0.92592593 0.89285714 0.85714286 0.96428571 1. 0.89285714] mean value: 0.8940476190476191 key: train_recall value: [1. 0.99193548 1. 0.99193548 1. 0.99190283 0.99595142 0.99595142 0.99595142 1. ] mean value: 0.9963628052762178 key: test_roc_auc value: [0.87169312 0.87433862 0.78042328 0.90939153 0.96296296 0.89087302 0.91005291 0.96362434 0.92592593 0.90939153] mean value: 0.8998677248677249 key: train_roc_auc value: [0.99595142 0.99394345 0.99797571 0.99596774 1. 0.99595142 0.99797571 0.99394345 0.99797571 1. ] mean value: 0.996968460232467 key: test_jcc value: [0.75862069 0.78787879 0.61290323 0.83333333 0.92592593 0.80645161 0.82758621 0.93103448 0.875 0.83333333] mean value: 0.8192067598491403 key: train_jcc value: [0.992 0.98795181 0.99598394 0.99193548 1. 0.99190283 0.99595142 0.98795181 0.99595142 1. ] mean value: 0.9939628702087965 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02939534 0.02539611 0.02019739 0.02184558 0.02218199 0.02318048 0.02253628 0.02189136 0.02258277 0.02346587] mean value: 0.023267316818237304 key: score_time value: [0.01231742 0.0091126 0.00892138 0.00938058 0.00929308 0.00980258 0.0096314 0.00899553 0.00903773 0.00899863] mean value: 0.009549093246459962 key: test_mcc value: [0.8565805 0.78353876 0.78174603 0.89139151 0.96423926 0.78353876 0.92724868 0.96428571 0.89139151 0.89139151] mean value: 0.8735352224131899 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.92727273 0.89090909 0.89090909 0.94545455 0.98181818 0.89090909 0.96363636 0.98181818 0.94545455 0.94545455] mean value: 0.9363636363636363 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92307692 0.88461538 0.88888889 0.94339623 0.98113208 0.89655172 0.96428571 0.98181818 0.94736842 0.94736842] mean value: 0.935850196081508 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96 0.92 0.88888889 0.96153846 1. 0.86666667 0.96428571 1. 0.93103448 0.93103448] mean value: 0.9423448696896972 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.85185185 0.88888889 0.92592593 0.96296296 0.92857143 0.96428571 0.96428571 0.96428571 0.96428571] mean value: 0.9304232804232804 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9265873 0.89021164 0.89087302 0.94510582 0.98148148 0.89021164 0.96362434 0.98214286 0.94510582 0.94510582] mean value: 0.9360449735449736 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.85714286 0.79310345 0.8 0.89285714 0.96296296 0.8125 0.93103448 0.96428571 0.9 0.9 ] mean value: 0.881388660828316 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.54 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12487578 0.12705588 0.13158202 0.12779379 0.12834001 0.12796426 0.12580824 0.12442112 0.12178755 0.12430549] mean value: 0.12639341354370118 key: score_time value: [0.01797676 0.01992846 0.01879454 0.01852512 0.01992798 0.01827836 0.01831627 0.01806569 0.01809311 0.0183239 ] mean value: 0.01862301826477051 key: test_mcc value: [0.82269299 0.79069197 0.67284827 0.85695439 0.92962225 0.78353876 0.78410665 0.89139151 0.75724019 0.78410665] mean value: 0.8073193627447607 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90909091 0.89090909 0.83636364 0.92727273 0.96363636 0.89090909 0.89090909 0.94545455 0.87272727 0.89090909] mean value: 0.9018181818181817 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90196078 0.89655172 0.83018868 0.92857143 0.96153846 0.89655172 0.88888889 0.94736842 0.8852459 0.88888889] mean value: 0.9025754902414515 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95833333 0.83870968 0.84615385 0.89655172 1. 0.86666667 0.92307692 0.93103448 0.81818182 0.92307692] mean value: 0.9001785394805417 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.85185185 0.96296296 0.81481481 0.96296296 0.92592593 0.92857143 0.85714286 0.96428571 0.96428571 0.85714286] mean value: 0.908994708994709 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90806878 0.89219577 0.83597884 0.92791005 0.96296296 0.89021164 0.89153439 0.94510582 0.87103175 0.89153439] mean value: 0.9016534391534392 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.82142857 0.8125 0.70967742 0.86666667 0.92592593 0.8125 0.8 0.9 0.79411765 0.8 ] mean value: 0.8242816230434826 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.31 Accuracy on Blind test: 0.71 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01087308 0.01074958 0.010571 0.01048541 0.0106163 0.01041913 0.01044679 0.01051927 0.01077652 0.01043153] mean value: 0.010588860511779786 key: score_time value: [0.00972962 0.0091846 0.00901866 0.00877166 0.00883484 0.00898385 0.00970221 0.00887561 0.00894332 0.00887108] mean value: 0.009091544151306152 key: test_mcc value: [0.27734221 0.62202265 0.45601459 0.53121272 0.56841568 0.67328042 0.78410665 0.38267891 0.42602426 0.56841568] mean value: 0.5289513774861617 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.63636364 0.8 0.72727273 0.76363636 0.78181818 0.83636364 0.89090909 0.69090909 0.70909091 0.78181818] mean value: 0.7618181818181818 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.65517241 0.81967213 0.70588235 0.77192982 0.76 0.83636364 0.88888889 0.71186441 0.74193548 0.8 ] mean value: 0.7691709138346379 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.61290323 0.73529412 0.75 0.73333333 0.82608696 0.85185185 0.92307692 0.67741935 0.67647059 0.75 ] mean value: 0.7536436351311362 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.7037037 0.92592593 0.66666667 0.81481481 0.7037037 0.82142857 0.85714286 0.75 0.82142857 0.85714286] mean value: 0.7921957671957671 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63756614 0.80224868 0.72619048 0.76455026 0.78042328 0.83664021 0.89153439 0.68981481 0.70701058 0.78042328] mean value: 0.7616402116402117 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.48717949 0.69444444 0.54545455 0.62857143 0.61290323 0.71875 0.8 0.55263158 0.58974359 0.66666667] mean value: 0.6296344966813983 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.19 Accuracy on Blind test: 0.65 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.8867898 1.81904387 1.8012898 1.79828095 1.82392001 1.80473995 1.81928992 1.82994795 1.81334281 1.79255438] mean value: 1.8189199447631836 key: score_time value: [0.102422 0.09241891 0.094944 0.09771085 0.09159255 0.09206796 0.14494443 0.09352231 0.09137368 0.09122992] mean value: 0.09922266006469727 key: test_mcc value: [0.8565805 0.92980214 0.82269299 0.89139151 1. 0.89139151 0.96423926 1. 0.96423926 0.89139151] mean value: 0.9211728680565975 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.92727273 0.96363636 0.90909091 0.94545455 1. 0.94545455 0.98181818 1. 0.98181818 0.94545455] mean value: 0.96 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92307692 0.96428571 0.90196078 0.94339623 1. 0.94736842 0.98245614 1. 0.98245614 0.94736842] mean value: 0.9592368770898475 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96 0.93103448 0.95833333 0.96153846 1. 0.93103448 0.96551724 1. 0.96551724 0.93103448] mean value: 0.9604009725906277 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 1. 0.85185185 0.92592593 1. 0.96428571 1. 1. 1. 0.96428571] mean value: 0.9595238095238096 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9265873 0.96428571 0.90806878 0.94510582 1. 0.94510582 0.98148148 1. 0.98148148 0.94510582] mean value: 0.9597222222222223 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.85714286 0.93103448 0.82142857 0.89285714 1. 0.9 0.96551724 1. 0.96551724 0.9 ] mean value: 0.9233497536945813 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.6 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.93221402 0.93724418 0.96414828 1.00427461 0.99941468 1.05338717 0.99944305 1.02059126 0.98394442 0.99001122] mean value: 0.9884672880172729 key: score_time value: [0.27788615 0.26738977 0.26741028 0.21771598 0.24798799 0.282516 0.21504283 0.24273372 0.21704197 0.2816062 ] mean value: 0.2517330884933472 key: test_mcc value: [0.81854376 0.92980214 0.82269299 0.89139151 0.96423926 0.89153439 0.92724868 1. 0.92962225 0.89139151] mean value: 0.9066466495296084 key: train_mcc value: [0.95556281 0.95151495 0.95971983 0.95556354 0.94748184 0.95556354 0.95154681 0.95151495 0.95151495 0.95962779] mean value: 0.9539611013324247 key: test_accuracy value: [0.90909091 0.96363636 0.90909091 0.94545455 0.98181818 0.94545455 0.96363636 1. 0.96363636 0.94545455] mean value: 0.9527272727272728 key: train_accuracy value: [0.97777778 0.97575758 0.97979798 0.97777778 0.97373737 0.97777778 0.97575758 0.97575758 0.97575758 0.97979798] mean value: 0.9769696969696969 key: test_fscore value: [0.90566038 0.96428571 0.90196078 0.94339623 0.98113208 0.94545455 0.96428571 1. 0.96551724 0.94736842] mean value: 0.9519061100016925 key: train_fscore value: [0.9778672 0.97580645 0.98 0.97777778 0.97384306 0.97777778 0.97580645 0.9757085 0.9757085 0.97983871] mean value: 0.9770134434076782 key: test_precision value: [0.92307692 0.93103448 0.95833333 0.96153846 1. 0.96296296 0.96428571 1. 0.93333333 0.93103448] mean value: 0.956559969404797 key: train_precision value: [0.97590361 0.97580645 0.97222222 0.97975709 0.97188755 0.97580645 0.97188755 0.9757085 0.9757085 0.97590361] mean value: 0.9750591543834124 key: test_recall value: [0.88888889 1. 0.85185185 0.92592593 0.96296296 0.92857143 0.96428571 1. 1. 0.96428571] mean value: 0.9486772486772487 key: train_recall value: [0.97983871 0.97580645 0.98790323 0.97580645 0.97580645 0.97975709 0.97975709 0.9757085 0.9757085 0.98380567] mean value: 0.9789898132427843 key: test_roc_auc value: [0.90873016 0.96428571 0.90806878 0.94510582 0.98148148 0.9457672 0.96362434 1. 0.96296296 0.94510582] mean value: 0.9525132275132275 key: train_roc_auc value: [0.97777361 0.97575748 0.97978157 0.97778177 0.97373319 0.97778177 0.97576564 0.97575748 0.97575748 0.97980606] mean value: 0.9769696029776674 key: test_jcc value: [0.82758621 0.93103448 0.82142857 0.89285714 0.96296296 0.89655172 0.93103448 1. 0.93333333 0.9 ] mean value: 0.9096788907133735 key: train_jcc value: [0.95669291 0.95275591 0.96078431 0.95652174 0.94901961 0.95652174 0.95275591 0.95256917 0.95256917 0.96047431] mean value: 0.955066477246029 MCC on Blind test: 0.25 Accuracy on Blind test: 0.59 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02516007 0.01044226 0.01031733 0.01038718 0.01034856 0.0114646 0.01037955 0.01027536 0.01097727 0.01110005] mean value: 0.012085223197937011 key: score_time value: [0.01297903 0.00912786 0.00893593 0.00900245 0.00891113 0.00889015 0.00875902 0.00933337 0.00927258 0.00886822] mean value: 0.009407973289489746 key: test_mcc value: [0.56841568 0.65330526 0.60000053 0.75033796 0.89139151 0.63841116 0.78174603 0.75878131 0.52935027 0.6005291 ] mean value: 0.6772268811002418 key: train_mcc value: [0.7134319 0.76975822 0.75369821 0.71362312 0.74951431 0.76567678 0.77794469 0.71736756 0.70631188 0.72166787] mean value: 0.738899455109274 key: test_accuracy value: [0.78181818 0.81818182 0.8 0.87272727 0.94545455 0.81818182 0.89090909 0.87272727 0.76363636 0.8 ] mean value: 0.8363636363636364 key: train_accuracy value: [0.85656566 0.88484848 0.87676768 0.85656566 0.87474747 0.88282828 0.88888889 0.85858586 0.85252525 0.86060606] mean value: 0.8692929292929292 key: test_fscore value: [0.76 0.83333333 0.79245283 0.87719298 0.94339623 0.81481481 0.89285714 0.8627451 0.77966102 0.8 ] mean value: 0.8356453445053573 key: train_fscore value: [0.85480573 0.88438134 0.87576375 0.85420945 0.87550201 0.88211382 0.88977956 0.85655738 0.84759916 0.85773196] mean value: 0.8678444146780728 key: test_precision value: [0.82608696 0.75757576 0.80769231 0.83333333 0.96153846 0.84615385 0.89285714 0.95652174 0.74193548 0.81481481] mean value: 0.8438509843488806 key: train_precision value: [0.86721992 0.88979592 0.88477366 0.87029289 0.872 0.88571429 0.88095238 0.86721992 0.875 0.87394958] mean value: 0.8766918548471572 key: test_recall value: [0.7037037 0.92592593 0.77777778 0.92592593 0.92592593 0.78571429 0.89285714 0.78571429 0.82142857 0.78571429] mean value: 0.8330687830687831 key: train_recall value: [0.84274194 0.87903226 0.86693548 0.83870968 0.87903226 0.87854251 0.89878543 0.84615385 0.82186235 0.84210526] mean value: 0.8593901005615776 key: test_roc_auc value: [0.78042328 0.82010582 0.79960317 0.87367725 0.94510582 0.81878307 0.89087302 0.87433862 0.76256614 0.80026455] mean value: 0.8365740740740741 key: train_roc_auc value: [0.85659364 0.88486026 0.87678758 0.8566018 0.8747388 0.88281964 0.88890884 0.85856079 0.85246343 0.86056876] mean value: 0.869290355230508 key: test_jcc value: [0.61290323 0.71428571 0.65625 0.78125 0.89285714 0.6875 0.80645161 0.75862069 0.63888889 0.66666667] mean value: 0.7215673941063262 key: train_jcc value: [0.74642857 0.79272727 0.77898551 0.74551971 0.77857143 0.78909091 0.80144404 0.74910394 0.73550725 0.75090253] mean value: 0.766828116175246 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.13260365 0.07910442 0.07843709 0.08254647 0.08715963 0.08041072 0.08045745 0.08163977 0.08075643 0.08242631] mean value: 0.0865541934967041 key: score_time value: [0.01190138 0.01126909 0.01197362 0.01131248 0.01109767 0.01134634 0.01114702 0.01117444 0.01196313 0.01197171] mean value: 0.0115156888961792 key: test_mcc value: [0.92724868 1. 0.85449735 0.89139151 1. 0.8565805 0.96423926 1. 0.96423926 0.89139151] mean value: 0.9349588068599265 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96363636 1. 0.92727273 0.94545455 1. 0.92727273 0.98181818 1. 0.98181818 0.94545455] mean value: 0.9672727272727273 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 1. 0.92592593 0.94339623 1. 0.93103448 0.98245614 1. 0.98245614 0.94736842] mean value: 0.967560029981699 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 1. 0.92592593 0.96153846 1. 0.9 0.96551724 1. 0.96551724 0.93103448] mean value: 0.9612496315944592 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 1. 0.92592593 0.92592593 1. 0.96428571 1. 1. 1. 0.96428571] mean value: 0.9743386243386243 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96362434 1. 0.92724868 0.94510582 1. 0.9265873 0.98148148 1. 0.98148148 0.94510582] mean value: 0.9670634920634921 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 1. 0.86206897 0.89285714 1. 0.87096774 0.96551724 1. 0.96551724 0.9 ] mean value: 0.9385499761639917 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.35 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04187107 0.0752697 0.04425359 0.06683969 0.07338858 0.07722521 0.04798603 0.05801558 0.04432964 0.10096288] mean value: 0.06301419734954834 key: score_time value: [0.01879668 0.01285887 0.02497196 0.01250243 0.02248597 0.01239944 0.02228522 0.01246166 0.02562261 0.01915598] mean value: 0.018354082107543947 key: test_mcc value: [0.64214885 0.7112589 0.57574525 0.82337971 0.92962225 0.89139151 0.74569602 0.89153439 0.80032673 0.74569602] mean value: 0.7756799630794246 key: train_mcc value: [0.9071347 0.89097143 0.91551261 0.90707697 0.88686823 0.88698776 0.89498 0.89498 0.89091681 0.89091681] mean value: 0.8966345315165652 key: test_accuracy value: [0.81818182 0.85454545 0.78181818 0.90909091 0.96363636 0.94545455 0.87272727 0.94545455 0.89090909 0.87272727] mean value: 0.8854545454545455 key: train_accuracy value: [0.95353535 0.94545455 0.95757576 0.95353535 0.94343434 0.94343434 0.94747475 0.94747475 0.94545455 0.94545455] mean value: 0.9482828282828283 key: test_fscore value: [0.8 0.85714286 0.75 0.9122807 0.96153846 0.94736842 0.87719298 0.94545455 0.90322581 0.87719298] mean value: 0.8831396758306775 key: train_fscore value: [0.95390782 0.94589178 0.9582505 0.95372233 0.94354839 0.9437751 0.94758065 0.94758065 0.94545455 0.94545455] mean value: 0.9485166298950366 key: test_precision value: [0.86956522 0.82758621 0.85714286 0.86666667 1. 0.93103448 0.86206897 0.96296296 0.82352941 0.86206897] mean value: 0.8862625736618152 key: train_precision value: [0.94820717 0.94023904 0.94509804 0.95180723 0.94354839 0.93625498 0.9437751 0.9437751 0.94354839 0.94354839] mean value: 0.9439801825444007 key: test_recall value: [0.74074074 0.88888889 0.66666667 0.96296296 0.92592593 0.96428571 0.89285714 0.92857143 1. 0.89285714] mean value: 0.8863756613756614 key: train_recall value: [0.95967742 0.9516129 0.97177419 0.95564516 0.94354839 0.951417 0.951417 0.951417 0.94736842 0.94736842] mean value: 0.9531245918767142 key: test_roc_auc value: [0.81679894 0.85515873 0.7797619 0.91005291 0.96296296 0.94510582 0.8723545 0.9457672 0.88888889 0.8723545 ] mean value: 0.8849206349206349 key: train_roc_auc value: [0.95352292 0.94544208 0.95754702 0.95353108 0.94343411 0.94345044 0.9474827 0.9474827 0.9454584 0.9454584 ] mean value: 0.9482809847198642 key: test_jcc value: [0.66666667 0.75 0.6 0.83870968 0.92592593 0.9 0.78125 0.89655172 0.82352941 0.78125 ] mean value: 0.7963883405914585 key: train_jcc value: [0.91187739 0.8973384 0.91984733 0.91153846 0.89312977 0.89353612 0.90038314 0.90038314 0.89655172 0.89655172] mean value: 0.9021137211926713 MCC on Blind test: 0.2 Accuracy on Blind test: 0.66 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02523851 0.01029706 0.01058316 0.01143408 0.01133108 0.01030755 0.01012635 0.01112747 0.00999737 0.0097971 ] mean value: 0.01202397346496582 key: score_time value: [0.01250792 0.00929117 0.00969481 0.00978851 0.00898647 0.00870633 0.00878239 0.00980616 0.00981569 0.0087533 ] mean value: 0.009613275527954102 key: test_mcc value: [0.60876172 0.79069197 0.52715278 0.81878307 0.92962225 0.78353876 0.89153439 0.67729621 0.68300095 0.67729621] mean value: 0.7387678309675754 key: train_mcc value: [0.73356087 0.76975822 0.7738449 0.72535511 0.75760346 0.76166531 0.76567678 0.71326401 0.73390987 0.72956825] mean value: 0.74642067826384 key: test_accuracy value: [0.8 0.89090909 0.76363636 0.90909091 0.96363636 0.89090909 0.94545455 0.83636364 0.83636364 0.83636364] mean value: 0.8672727272727272 key: train_accuracy value: [0.86666667 0.88484848 0.88686869 0.86262626 0.87878788 0.88080808 0.88282828 0.85656566 0.86666667 0.86464646] mean value: 0.8731313131313132 key: test_fscore value: [0.7755102 0.89655172 0.75471698 0.90909091 0.96153846 0.89655172 0.94545455 0.83018868 0.85245902 0.83018868] mean value: 0.8652250924457494 key: train_fscore value: [0.86530612 0.88438134 0.88617886 0.86178862 0.87854251 0.87983707 0.88211382 0.85480573 0.86363636 0.862423 ] mean value: 0.8719013426889961 key: test_precision value: [0.86363636 0.83870968 0.76923077 0.89285714 1. 0.86666667 0.96296296 0.88 0.78787879 0.88 ] mean value: 0.8741942370652048 key: train_precision value: [0.87603306 0.88979592 0.89344262 0.86885246 0.88211382 0.8852459 0.88571429 0.86363636 0.88185654 0.875 ] mean value: 0.8801690970398393 key: test_recall value: [0.7037037 0.96296296 0.74074074 0.92592593 0.92592593 0.92857143 0.92857143 0.78571429 0.92857143 0.78571429] mean value: 0.8616402116402117 key: train_recall value: [0.85483871 0.87903226 0.87903226 0.85483871 0.875 0.87449393 0.87854251 0.84615385 0.84615385 0.85020243] mean value: 0.8638288494188324 key: test_roc_auc value: [0.79828042 0.89219577 0.76322751 0.90939153 0.96296296 0.89021164 0.9457672 0.83730159 0.83465608 0.83730159] mean value: 0.8671296296296296 key: train_roc_auc value: [0.86669061 0.88486026 0.88688455 0.86264203 0.87879555 0.88079535 0.88281964 0.85654467 0.86662531 0.86461734] mean value: 0.8731275303643725 key: test_jcc value: [0.63333333 0.8125 0.60606061 0.83333333 0.92592593 0.8125 0.89655172 0.70967742 0.74285714 0.70967742] mean value: 0.768241690435795 key: train_jcc value: [0.76258993 0.79272727 0.79562044 0.75714286 0.7833935 0.78545455 0.78909091 0.74642857 0.76 0.75812274] mean value: 0.7730570767345278 MCC on Blind test: 0.29 Accuracy on Blind test: 0.72 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01875067 0.01982927 0.02383804 0.01903248 0.01827717 0.02330303 0.02174616 0.01853609 0.02163601 0.02033186] mean value: 0.020528078079223633 key: score_time value: [0.01003098 0.01140165 0.01187396 0.01201582 0.01195359 0.01196742 0.01209927 0.01293659 0.01179004 0.01193452] mean value: 0.011800384521484375 key: test_mcc value: [0.75724019 0.70899471 0.57068493 0.78353876 1. 0.76980036 0.8565805 0.3105295 0.76980036 0.85695439] mean value: 0.7384123695263163 key: train_mcc value: [0.87485317 0.85422763 0.8581304 0.83110746 0.80103924 0.78322194 0.86171594 0.40327111 0.87905164 0.87531676] mean value: 0.8021935299759908 key: test_accuracy value: [0.87272727 0.85454545 0.78181818 0.89090909 1. 0.87272727 0.92727273 0.58181818 0.87272727 0.92727273] mean value: 0.8581818181818182 key: train_accuracy value: [0.93535354 0.92525253 0.92525253 0.91111111 0.8969697 0.88282828 0.92929293 0.64040404 0.93939394 0.93737374] mean value: 0.8923232323232323 key: test_fscore value: [0.85714286 0.85185185 0.79310345 0.88461538 1. 0.88888889 0.93103448 0.3030303 0.88888889 0.92592593] mean value: 0.8324482031378583 key: train_fscore value: [0.93220339 0.9217759 0.93005671 0.90434783 0.90359168 0.89377289 0.93203883 0.43670886 0.94 0.93608247] mean value: 0.8730578571342905 key: test_precision value: [0.95454545 0.85185185 0.74193548 0.92 1. 0.8 0.9 1. 0.8 0.96153846] mean value: 0.8929871251806736 key: train_precision value: [0.98214286 0.96888889 0.87544484 0.98113208 0.85053381 0.81605351 0.89552239 1. 0.92885375 0.95378151] mean value: 0.9252353636501418 key: test_recall value: [0.77777778 0.85185185 0.85185185 0.85185185 1. 1. 0.96428571 0.17857143 1. 0.89285714] mean value: 0.8369047619047619 key: train_recall value: [0.88709677 0.87903226 0.99193548 0.83870968 0.96370968 0.98785425 0.97165992 0.27935223 0.951417 0.91902834] mean value: 0.8669795611858431 key: test_roc_auc value: [0.87103175 0.85449735 0.78306878 0.89021164 1. 0.87037037 0.9265873 0.58928571 0.87037037 0.92791005] mean value: 0.8583333333333333 key: train_roc_auc value: [0.93545122 0.92534609 0.92511754 0.91125767 0.8968346 0.88304003 0.92937835 0.63967611 0.93941818 0.93733675] mean value: 0.8922856536502547 key: test_jcc value: [0.75 0.74193548 0.65714286 0.79310345 1. 0.8 0.87096774 0.17857143 0.8 0.86206897] mean value: 0.7453789925313841 key: train_jcc value: [0.87301587 0.85490196 0.86925795 0.82539683 0.82413793 0.80794702 0.87272727 0.27935223 0.88679245 0.87984496] mean value: 0.7973374474147499 MCC on Blind test: 0.21 Accuracy on Blind test: 0.49 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01975441 0.01947474 0.01754856 0.01995182 0.02250051 0.01994252 0.02293873 0.02241492 0.02045321 0.02638674] mean value: 0.02113661766052246 key: score_time value: [0.01200104 0.01194429 0.01183605 0.01182842 0.01182103 0.01209521 0.01194048 0.01198792 0.01181817 0.01200318] mean value: 0.011927580833435059 key: test_mcc value: [0.74569602 0.68491749 0.60268595 0.81878307 0.92962225 0.75724019 0.72754449 0.89153439 0.86334835 0.85695439] mean value: 0.7878326593584918 key: train_mcc value: [0.87731732 0.79931631 0.8834206 0.87890613 0.88545816 0.83650879 0.86174381 0.9071347 0.88686823 0.90920534] mean value: 0.8725879390008385 key: test_accuracy value: [0.87272727 0.81818182 0.8 0.90909091 0.96363636 0.87272727 0.85454545 0.94545455 0.92727273 0.92727273] mean value: 0.889090909090909 key: train_accuracy value: [0.93737374 0.89494949 0.94141414 0.93939394 0.94141414 0.91515152 0.92727273 0.95353535 0.94343434 0.95353535] mean value: 0.9347474747474748 key: test_fscore value: [0.86792453 0.84375 0.78431373 0.90909091 0.96153846 0.8852459 0.84 0.94545455 0.93333333 0.92592593] mean value: 0.8896577330774602 key: train_fscore value: [0.93980583 0.90262172 0.94045175 0.93902439 0.93920335 0.91984733 0.92207792 0.95315682 0.94331984 0.95178197] mean value: 0.9351290919849996 key: test_precision value: [0.88461538 0.72972973 0.83333333 0.89285714 1. 0.81818182 0.95454545 0.96296296 0.875 0.96153846] mean value: 0.8912764287764288 key: train_precision value: [0.90636704 0.84265734 0.958159 0.94672131 0.97816594 0.8700361 0.99069767 0.95901639 0.94331984 0.98695652] mean value: 0.9382097158751853 key: test_recall value: [0.85185185 1. 0.74074074 0.92592593 0.92592593 0.96428571 0.75 0.92857143 1. 0.89285714] mean value: 0.898015873015873 key: train_recall value: [0.97580645 0.97177419 0.9233871 0.93145161 0.90322581 0.9757085 0.86234818 0.94736842 0.94331984 0.91902834] mean value: 0.9353418440642549 key: test_roc_auc value: [0.8723545 0.82142857 0.7989418 0.90939153 0.96296296 0.87103175 0.85648148 0.9457672 0.92592593 0.92791005] mean value: 0.8892195767195767 key: train_roc_auc value: [0.93729594 0.89479398 0.94145063 0.93941002 0.94149145 0.91527361 0.92714183 0.95352292 0.94343411 0.95346578] mean value: 0.9347280266422882 key: test_jcc value: [0.76666667 0.72972973 0.64516129 0.83333333 0.92592593 0.79411765 0.72413793 0.89655172 0.875 0.86206897] mean value: 0.8052693213726715 key: train_jcc value: [0.88644689 0.8225256 0.8875969 0.88505747 0.88537549 0.85159011 0.85542169 0.91050584 0.89272031 0.908 ] mean value: 0.8785240284120172 MCC on Blind test: 0.4 Accuracy on Blind test: 0.94 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.20455599 0.1857903 0.19286776 0.19176698 0.18908429 0.19593883 0.19246387 0.19231415 0.18916607 0.18785882] mean value: 0.19218070507049562 key: score_time value: [0.01529026 0.01592922 0.01679468 0.01615262 0.01680255 0.0168097 0.01640892 0.01586556 0.01548862 0.01537132] mean value: 0.016091346740722656 key: test_mcc value: [0.92724868 0.96428571 0.85449735 0.89139151 0.92962225 0.89602867 0.92724868 0.96428571 0.96423926 0.89153439] mean value: 0.9210382220929846 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96363636 0.98181818 0.92727273 0.94545455 0.96363636 0.94545455 0.96363636 0.98181818 0.98181818 0.94545455] mean value: 0.96 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 0.98181818 0.92592593 0.94339623 0.96153846 0.94915254 0.96428571 0.98181818 0.98245614 0.94545455] mean value: 0.9598808882942826 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 0.96428571 0.92592593 0.96153846 1. 0.90322581 0.96428571 1. 0.96551724 0.96296296] mean value: 0.9610704789792666 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 1. 0.92592593 0.92592593 0.92592593 1. 0.96428571 0.96428571 1. 0.92857143] mean value: 0.9597883597883597 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96362434 0.98214286 0.92724868 0.94510582 0.96296296 0.94444444 0.96362434 0.98214286 0.98148148 0.9457672 ] mean value: 0.9598544973544973 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 0.96428571 0.86206897 0.89285714 0.92592593 0.90322581 0.93103448 0.96428571 0.96551724 0.89655172] mean value: 0.9234324146170643 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.36 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.07000065 0.07286143 0.07021427 0.07388616 0.08600783 0.09426713 0.07593942 0.0796895 0.08389831 0.09681273] mean value: 0.08035774230957031 key: score_time value: [0.02999687 0.02562594 0.02819896 0.02181721 0.03845429 0.03784633 0.02426839 0.0302527 0.02698541 0.0410769 ] mean value: 0.03045229911804199 key: test_mcc value: [0.92724868 0.89642146 0.85449735 0.8565805 0.96428571 0.92962225 0.96423926 1. 0.89139151 0.85449735] mean value: 0.9138784079339198 key: train_mcc value: [0.9838707 0.9838707 0.9878867 0.97980573 0.98383832 0.97172522 0.98795103 0.98387018 0.97980573 0.98387018] mean value: 0.9826494492650794 key: test_accuracy value: [0.96363636 0.94545455 0.92727273 0.92727273 0.98181818 0.96363636 0.98181818 1. 0.94545455 0.92727273] mean value: 0.9563636363636363 key: train_accuracy value: [0.99191919 0.99191919 0.99393939 0.98989899 0.99191919 0.98585859 0.99393939 0.99191919 0.98989899 0.99191919] mean value: 0.9913131313131313 key: test_fscore value: [0.96296296 0.94736842 0.92592593 0.92307692 0.98181818 0.96551724 0.98245614 1. 0.94736842 0.92857143] mean value: 0.9565065646190873 key: train_fscore value: [0.99190283 0.99190283 0.99396378 0.98993964 0.99193548 0.98585859 0.99389002 0.99186992 0.98985801 0.99186992] mean value: 0.9912991028204244 key: test_precision value: [0.96296296 0.9 0.92592593 0.96 0.96428571 0.93333333 0.96551724 1. 0.93103448 0.92857143] mean value: 0.9471631089217296 key: train_precision value: [0.99593496 0.99593496 0.99196787 0.98795181 0.99193548 0.98387097 1. 0.99591837 0.99186992 0.99591837] mean value: 0.9931302702420014 key: test_recall value: [0.96296296 1. 0.92592593 0.88888889 1. 1. 1. 1. 0.96428571 0.92857143] mean value: 0.9670634920634921 key: train_recall value: [0.98790323 0.98790323 0.99596774 0.99193548 0.99193548 0.98785425 0.98785425 0.98785425 0.98785425 0.98785425] mean value: 0.9894916416351052 key: test_roc_auc value: [0.96362434 0.94642857 0.92724868 0.9265873 0.98214286 0.96296296 0.98148148 1. 0.94510582 0.92724868] mean value: 0.9562830687830688 key: train_roc_auc value: [0.99192732 0.99192732 0.99393529 0.98989487 0.99191916 0.98586261 0.99392713 0.991911 0.98989487 0.991911 ] mean value: 0.991311055243568 key: test_jcc value: [0.92857143 0.9 0.86206897 0.85714286 0.96428571 0.93333333 0.96551724 1. 0.9 0.86666667] mean value: 0.9177586206896552 key: train_jcc value: [0.98393574 0.98393574 0.988 0.98007968 0.984 0.97211155 0.98785425 0.98387097 0.97991968 0.98387097] mean value: 0.9827578586214413 MCC on Blind test: 0.09 Accuracy on Blind test: 0.32 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.17707753 0.29563475 0.24107099 0.21363139 0.16412997 0.16844749 0.17108274 0.17586136 0.11147547 0.2053895 ] mean value: 0.19238011837005614 key: score_time value: [0.02453613 0.02744293 0.04643822 0.02506137 0.02915144 0.03036785 0.03013134 0.0295155 0.03101945 0.03005862] mean value: 0.03037228584289551 key: test_mcc value: [0.57574525 0.67729621 0.45601459 0.67602163 0.89153439 0.56556341 0.75033796 0.85449735 0.61858957 0.64402061] mean value: 0.6709620987267373 key: train_mcc value: [0.98795161 0.98396735 0.99195168 0.98396735 0.98396735 0.98795103 0.98396631 0.98396631 0.98396631 0.98396631] mean value: 0.9855621621215631 key: test_accuracy value: [0.78181818 0.83636364 0.72727273 0.83636364 0.94545455 0.78181818 0.87272727 0.92727273 0.8 0.81818182] mean value: 0.8327272727272728 key: train_accuracy value: [0.99393939 0.99191919 0.9959596 0.99191919 0.99191919 0.99393939 0.99191919 0.99191919 0.99191919 0.99191919] mean value: 0.9927272727272727 key: test_fscore value: [0.75 0.84210526 0.70588235 0.82352941 0.94545455 0.77777778 0.86792453 0.92857143 0.82539683 0.80769231] mean value: 0.827433444105855 key: train_fscore value: [0.99391481 0.99186992 0.99595142 0.99186992 0.99186992 0.99389002 0.99183673 0.99183673 0.99183673 0.99183673] mean value: 0.992671293954595 key: test_precision value: [0.85714286 0.8 0.75 0.875 0.92857143 0.80769231 0.92 0.92857143 0.74285714 0.875 ] mean value: 0.8484835164835165 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.88888889 0.66666667 0.77777778 0.96296296 0.75 0.82142857 0.92857143 0.92857143 0.75 ] mean value: 0.8141534391534392 key: train_recall value: [0.98790323 0.98387097 0.99193548 0.98387097 0.98387097 0.98785425 0.98380567 0.98380567 0.98380567 0.98380567] mean value: 0.9854528535980149 key: test_roc_auc value: [0.7797619 0.83730159 0.72619048 0.83531746 0.9457672 0.78240741 0.87367725 0.92724868 0.79761905 0.81944444] mean value: 0.832473544973545 key: train_roc_auc value: [0.99395161 0.99193548 0.99596774 0.99193548 0.99193548 0.99392713 0.99190283 0.99190283 0.99190283 0.99190283] mean value: 0.9927264267990075 key: test_jcc value: [0.6 0.72727273 0.54545455 0.7 0.89655172 0.63636364 0.76666667 0.86666667 0.7027027 0.67741935] mean value: 0.7119098024103586 key: train_jcc value: [0.98790323 0.98387097 0.99193548 0.98387097 0.98387097 0.98785425 0.98380567 0.98380567 0.98380567 0.98380567] mean value: 0.9854528535980149 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.7654438 0.7468729 0.75707912 0.74711108 0.76610827 0.76259875 0.75790429 0.75682592 0.75021052 0.74763083] mean value: 0.7557785511016846 key: score_time value: [0.00954056 0.01017046 0.00927854 0.00945091 0.0103693 0.00993395 0.00967669 0.00912714 0.00927448 0.00911355] mean value: 0.009593558311462403 key: test_mcc value: [0.92724868 0.96428571 0.85449735 0.89139151 1. 0.89602867 0.96423926 1. 0.96423926 0.89139151] mean value: 0.9353321952904994 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96363636 0.98181818 0.92727273 0.94545455 1. 0.94545455 0.98181818 1. 0.98181818 0.94545455] mean value: 0.9672727272727273 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 0.98181818 0.92592593 0.94339623 1. 0.94915254 0.98245614 1. 0.98245614 0.94736842] mean value: 0.9675536541249432 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 0.96428571 0.92592593 0.96153846 1. 0.90322581 0.96551724 1. 0.96551724 0.93103448] mean value: 0.9580007836681919 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96296296 1. 0.92592593 0.92592593 1. 1. 1. 1. 1. 0.96428571] mean value: 0.977910052910053 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96362434 0.98214286 0.92724868 0.94510582 1. 0.94444444 0.98148148 1. 0.98148148 0.94510582] mean value: 0.9670634920634921 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 0.96428571 0.86206897 0.89285714 1. 0.90322581 0.96551724 1. 0.96551724 0.9 ] mean value: 0.9382043540441761 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.29 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03061318 0.03356743 0.03385997 0.03194618 0.03091121 0.030725 0.03083324 0.03064632 0.03090024 0.0306983 ] mean value: 0.031470108032226565 key: score_time value: [0.01244497 0.01246119 0.0126853 0.01307559 0.01310396 0.01295328 0.0130353 0.01303506 0.01296496 0.01314688] mean value: 0.01289064884185791 key: test_mcc value: [ 0.04111183 0.12729377 0.19302201 0.34263208 0.33604449 -0.13495839 0.00509277 0.06900656 0.31698002 0.1028689 ] mean value: 0.13990940292625395 key: train_mcc value: [0.4529978 0.39302546 0.53873258 0.55123208 0.36860407 0.29590134 0.29590134 0.40164502 0.63609111 0.37064115] mean value: 0.4304771947958042 key: test_accuracy value: [0.50909091 0.54545455 0.58181818 0.63636364 0.61818182 0.47272727 0.50909091 0.52727273 0.61818182 0.54545455] mean value: 0.5563636363636363 key: train_accuracy value: [0.67070707 0.63434343 0.72525253 0.73333333 0.62020202 0.57979798 0.57979798 0.63838384 0.78787879 0.62020202] mean value: 0.658989898989899 key: test_fscore value: [0.63013699 0.64788732 0.65671642 0.71428571 0.71232877 0.63291139 0.65822785 0.66666667 0.72 0.65753425] mean value: 0.669669536331282 key: train_fscore value: [0.75265554 0.73264402 0.78481013 0.78980892 0.7251462 0.7037037 0.7037037 0.73402675 0.82470785 0.72434018] mean value: 0.747554697471538 key: test_precision value: [0.5 0.52272727 0.55 0.58139535 0.56521739 0.49019608 0.50980392 0.52 0.57446809 0.53333333] mean value: 0.5347141431308546 key: train_precision value: [0.60340633 0.57808858 0.64583333 0.65263158 0.56880734 0.54285714 0.54285714 0.57981221 0.70170455 0.56781609] mean value: 0.5983814285548509 key: test_recall value: [0.85185185 0.85185185 0.81481481 0.92592593 0.96296296 0.89285714 0.92857143 0.92857143 0.96428571 0.85714286] mean value: 0.8978835978835978 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.51521164 0.55092593 0.58597884 0.64153439 0.62433862 0.46494709 0.50132275 0.51984127 0.61177249 0.53968254] mean value: 0.5555555555555556 key: train_roc_auc value: [0.67004049 0.63360324 0.72469636 0.73279352 0.6194332 0.58064516 0.58064516 0.6391129 0.78830645 0.62096774] mean value: 0.6590244220974272 key: test_jcc value: [0.46 0.47916667 0.48888889 0.55555556 0.55319149 0.46296296 0.49056604 0.5 0.5625 0.48979592] mean value: 0.5042627519538972 key: train_jcc value: [0.60340633 0.57808858 0.64583333 0.65263158 0.56880734 0.54285714 0.54285714 0.57981221 0.70170455 0.56781609] mean value: 0.5983814285548509 MCC on Blind test: -0.06 Accuracy on Blind test: 0.17 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02615452 0.03805971 0.03778291 0.03775167 0.03771639 0.03777146 0.02376747 0.03762388 0.03809237 0.03769779] mean value: 0.03524181842803955 key: score_time value: [0.01908135 0.01849127 0.01843929 0.0183835 0.01840234 0.01997757 0.018466 0.01841736 0.01839828 0.01835322] mean value: 0.018641018867492677 key: test_mcc value: [0.71588202 0.78410665 0.52935027 0.82337971 0.96423926 0.85449735 0.81878307 0.92724868 0.8565805 0.81878307] mean value: 0.8092850574012925 key: train_mcc value: [0.86672653 0.87081606 0.89125899 0.86667211 0.8586449 0.86274286 0.86274286 0.85457514 0.86265611 0.86673306] mean value: 0.8663568632850653 key: test_accuracy value: [0.85454545 0.89090909 0.76363636 0.90909091 0.98181818 0.92727273 0.90909091 0.96363636 0.92727273 0.90909091] mean value: 0.9036363636363636 key: train_accuracy value: [0.93333333 0.93535354 0.94545455 0.93333333 0.92929293 0.93131313 0.93131313 0.92727273 0.93131313 0.93333333] mean value: 0.9331313131313131 key: test_fscore value: [0.84 0.89285714 0.74509804 0.9122807 0.98113208 0.92857143 0.90909091 0.96428571 0.93103448 0.90909091] mean value: 0.9013441403096495 key: train_fscore value: [0.93386774 0.936 0.94632207 0.93360161 0.92985972 0.93172691 0.93172691 0.92741935 0.93145161 0.93360161] mean value: 0.9335577524823128 key: test_precision value: [0.91304348 0.86206897 0.79166667 0.86666667 1. 0.92857143 0.92592593 0.96428571 0.9 0.92592593] mean value: 0.9078154771820439 key: train_precision value: [0.92828685 0.92857143 0.93333333 0.93172691 0.92430279 0.92430279 0.92430279 0.92369478 0.92771084 0.928 ] mean value: 0.927423251114875 key: test_recall value: [0.77777778 0.92592593 0.7037037 0.96296296 0.96296296 0.92857143 0.89285714 0.96428571 0.96428571 0.89285714] mean value: 0.8976190476190476 key: train_recall value: [0.93951613 0.94354839 0.95967742 0.93548387 0.93548387 0.93927126 0.93927126 0.93117409 0.93522267 0.93927126] mean value: 0.9397920203735144 key: test_roc_auc value: [0.8531746 0.89153439 0.76256614 0.91005291 0.98148148 0.92724868 0.90939153 0.96362434 0.9265873 0.90939153] mean value: 0.903505291005291 key: train_roc_auc value: [0.93332082 0.93533695 0.94542575 0.93332898 0.9292804 0.93132918 0.93132918 0.92728059 0.93132101 0.9333453 ] mean value: 0.9331298158547734 key: test_jcc value: [0.72413793 0.80645161 0.59375 0.83870968 0.96296296 0.86666667 0.83333333 0.93103448 0.87096774 0.83333333] mean value: 0.8261347742347465 key: train_jcc value: [0.87593985 0.87969925 0.89811321 0.8754717 0.86891386 0.87218045 0.87218045 0.86466165 0.87169811 0.8754717 ] mean value: 0.8754330228794374 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.27136374 0.27113485 0.33172965 0.30581665 0.3005619 0.26949382 0.2750001 0.29058337 0.27251935 0.3191328 ] mean value: 0.29073362350463866 key: score_time value: [0.01856899 0.01850533 0.0185225 0.01847601 0.01860619 0.01852322 0.02130389 0.01841664 0.01855254 0.01855564] mean value: 0.018803095817565917 key: test_mcc value: [0.71588202 0.78410665 0.52935027 0.82337971 0.96423926 0.85449735 0.81878307 0.92724868 0.8565805 0.81878307] mean value: 0.8092850574012925 key: train_mcc value: [0.89916888 0.89497657 0.89125899 0.86667211 0.8586449 0.86274286 0.86274286 0.85457514 0.86265611 0.86673306] mean value: 0.8720171491265002 key: test_accuracy value: [0.85454545 0.89090909 0.76363636 0.90909091 0.98181818 0.92727273 0.90909091 0.96363636 0.92727273 0.90909091] mean value: 0.9036363636363636 key: train_accuracy value: [0.94949495 0.94747475 0.94545455 0.93333333 0.92929293 0.93131313 0.93131313 0.92727273 0.93131313 0.93333333] mean value: 0.935959595959596 key: test_fscore value: [0.84 0.89285714 0.74509804 0.9122807 0.98113208 0.92857143 0.90909091 0.96428571 0.93103448 0.90909091] mean value: 0.9013441403096495 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:175: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:178: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.9500998 0.94779116 0.94632207 0.93360161 0.92985972 0.93172691 0.93172691 0.92741935 0.93145161 0.93360161] mean value: 0.9363600754410022 key: test_precision value: [0.91304348 0.86206897 0.79166667 0.86666667 1. 0.92857143 0.92592593 0.96428571 0.9 0.92592593] mean value: 0.9078154771820439 key: train_precision value: [0.94071146 0.944 0.93333333 0.93172691 0.92430279 0.92430279 0.92430279 0.92369478 0.92771084 0.928 ] mean value: 0.9302085692438272 key: test_recall value: [0.77777778 0.92592593 0.7037037 0.96296296 0.96296296 0.92857143 0.89285714 0.96428571 0.96428571 0.89285714] mean value: 0.8976190476190476 key: train_recall value: [0.95967742 0.9516129 0.95967742 0.93548387 0.93548387 0.93927126 0.93927126 0.93117409 0.93522267 0.93927126] mean value: 0.9426146010186758 key: test_roc_auc value: [0.8531746 0.89153439 0.76256614 0.91005291 0.98148148 0.92724868 0.90939153 0.96362434 0.9265873 0.90939153] mean value: 0.903505291005291 key: train_roc_auc value: [0.94947434 0.94746637 0.94542575 0.93332898 0.9292804 0.93132918 0.93132918 0.92728059 0.93132101 0.9333453 ] mean value: 0.935958110225937 key: test_jcc value: [0.72413793 0.80645161 0.59375 0.83870968 0.96296296 0.86666667 0.83333333 0.93103448 0.87096774 0.83333333] mean value: 0.8261347742347465 key: train_jcc value: [0.90494297 0.90076336 0.89811321 0.8754717 0.86891386 0.87218045 0.87218045 0.86466165 0.87169811 0.8754717 ] mean value: 0.8804397455608106 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03542757 0.03597736 0.03544164 0.03828049 0.03607202 0.03562999 0.03738952 0.02794266 0.03553414 0.03548431] mean value: 0.03531796932220459 key: score_time value: [0.01272678 0.01266503 0.01269794 0.01263142 0.01262355 0.01270127 0.01296377 0.01188087 0.01294971 0.01273322] mean value: 0.012657356262207032 key: test_mcc value: [0.82942474 0.86189955 0.68434084 0.75808552 0.82195294 0.85933785 0.75047877 0.93094934 0.71611487 0.78571429] mean value: 0.7998298708962035 key: train_mcc value: [0.85809003 0.86193803 0.86590623 0.87777588 0.87024737 0.84662074 0.84651574 0.85837416 0.88199914 0.86237183] mean value: 0.8629839142601098 key: test_accuracy value: [0.9122807 0.92982456 0.84210526 0.87719298 0.91071429 0.92857143 0.875 0.96428571 0.85714286 0.89285714] mean value: 0.8989974937343358 key: train_accuracy value: [0.92899408 0.93096647 0.93293886 0.93885602 0.93503937 0.92322835 0.92322835 0.92913386 0.94094488 0.93110236] mean value: 0.9314432589417447 key: test_fscore value: [0.91525424 0.93103448 0.84745763 0.8852459 0.90909091 0.92592593 0.87272727 0.96296296 0.86206897 0.89285714] mean value: 0.9004625427886199 key: train_fscore value: [0.9296875 0.93123772 0.93307087 0.93909627 0.93567251 0.92397661 0.92367906 0.9296875 0.94140625 0.93177388] mean value: 0.9319288166968593 key: test_precision value: [0.87096774 0.9 0.83333333 0.84375 0.92592593 0.96153846 0.88888889 1. 0.83333333 0.89285714] mean value: 0.895059482781257 key: train_precision value: [0.92248062 0.92941176 0.92941176 0.93359375 0.92664093 0.91505792 0.91828794 0.92248062 0.93410853 0.92277992] mean value: 0.925425374907558 key: test_recall value: [0.96428571 0.96428571 0.86206897 0.93103448 0.89285714 0.89285714 0.85714286 0.92857143 0.89285714 0.89285714] mean value: 0.9078817733990148 key: train_recall value: [0.93700787 0.93307087 0.93675889 0.94466403 0.94488189 0.93307087 0.92913386 0.93700787 0.9488189 0.94094488] mean value: 0.9385359932775201 key: test_roc_auc value: [0.91317734 0.93041872 0.84174877 0.87623153 0.91071429 0.92857143 0.875 0.96428571 0.85714286 0.89285714] mean value: 0.8990147783251231 key: train_roc_auc value: [0.92897825 0.93096231 0.93294638 0.93886745 0.93503937 0.92322835 0.92322835 0.92913386 0.94094488 0.93110236] mean value: 0.931443154585914 key: test_jcc value: [0.84375 0.87096774 0.73529412 0.79411765 0.83333333 0.86206897 0.77419355 0.92857143 0.75757576 0.80645161] mean value: 0.820632415292945 key: train_jcc value: [0.86861314 0.87132353 0.87453875 0.88518519 0.87912088 0.85869565 0.85818182 0.86861314 0.88929889 0.87226277] mean value: 0.8725833753544835 MCC on Blind test: 0.31 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.7798295 0.99009943 0.76765776 0.93879628 0.81983972 0.7645843 0.95550585 0.77687287 0.81234503 0.94892621] mean value: 0.8554456949234008 key: score_time value: [0.01273251 0.01294208 0.01296949 0.01316237 0.01302671 0.01300812 0.0130589 0.01300597 0.01304603 0.01303864] mean value: 0.0129990816116333 key: test_mcc value: [0.86189955 0.82512315 0.64901478 0.79110556 0.82195294 0.89802651 0.78571429 0.85933785 0.71611487 0.85714286] mean value: 0.8065432358778236 key: train_mcc value: [0.88168194 0.89349683 0.88954592 0.90927764 0.89774912 0.88976378 0.88979136 0.93703692 0.90945587 0.89766562] mean value: 0.8995465023358031 key: test_accuracy value: [0.92982456 0.9122807 0.8245614 0.89473684 0.91071429 0.94642857 0.89285714 0.92857143 0.85714286 0.92857143] mean value: 0.9025689223057645 key: train_accuracy value: [0.9408284 0.94674556 0.94477318 0.95463511 0.9488189 0.94488189 0.94488189 0.96850394 0.95472441 0.9488189 ] mean value: 0.9497612169780553 key: test_fscore value: [0.93103448 0.9122807 0.82758621 0.9 0.9122807 0.94339623 0.89285714 0.92592593 0.86206897 0.92857143] mean value: 0.9036001782450778 key: train_fscore value: [0.94117647 0.94695481 0.94466403 0.95463511 0.94921875 0.94488189 0.94509804 0.96837945 0.95463511 0.94901961] mean value: 0.9498663265993761 key: test_precision value: [0.9 0.89655172 0.82758621 0.87096774 0.89655172 1. 0.89285714 0.96153846 0.83333333 0.92857143] mean value: 0.9007957763408264 key: train_precision value: [0.9375 0.94509804 0.94466403 0.95275591 0.94186047 0.94488189 0.94140625 0.97222222 0.95652174 0.9453125 ] mean value: 0.9482223042580766 key: test_recall value: [0.96428571 0.92857143 0.82758621 0.93103448 0.92857143 0.89285714 0.89285714 0.89285714 0.89285714 0.92857143] mean value: 0.9080049261083745 key: train_recall value: [0.94488189 0.9488189 0.94466403 0.95652174 0.95669291 0.94488189 0.9488189 0.96456693 0.95275591 0.95275591] mean value: 0.9515358999097445 key: test_roc_auc value: [0.93041872 0.91256158 0.82450739 0.89408867 0.91071429 0.94642857 0.89285714 0.92857143 0.85714286 0.92857143] mean value: 0.9025862068965518 key: train_roc_auc value: [0.94082039 0.94674146 0.94477296 0.95463882 0.9488189 0.94488189 0.94488189 0.96850394 0.95472441 0.9488189 ] mean value: 0.9497603560424512 key: test_jcc value: [0.87096774 0.83870968 0.70588235 0.81818182 0.83870968 0.89285714 0.80645161 0.86206897 0.75757576 0.86666667] mean value: 0.8258071413417223 key: train_jcc value: [0.88888889 0.89925373 0.89513109 0.91320755 0.90334572 0.89552239 0.89591078 0.93869732 0.91320755 0.90298507] mean value: 0.9046150086984556 MCC on Blind test: 0.3 Accuracy on Blind test: 0.69 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01444316 0.01245856 0.01029587 0.00997543 0.0098331 0.00987601 0.00983953 0.00994825 0.00983977 0.00989795] mean value: 0.010640764236450195 key: score_time value: [0.0122149 0.0091784 0.00903082 0.00886178 0.00877905 0.00872421 0.00874662 0.00869894 0.00871277 0.00874066] mean value: 0.00916881561279297 key: test_mcc value: [0.78940887 0.57973205 0.65634573 0.58562417 0.65814518 0.58255173 0.67900461 0.65814518 0.50128041 0.72168784] mean value: 0.6411925754198944 key: train_mcc value: [0.67104275 0.66923233 0.64864227 0.6717949 0.66098223 0.6638126 0.64404637 0.6813177 0.66816241 0.69204983] mean value: 0.6671083389867292 key: test_accuracy value: [0.89473684 0.78947368 0.8245614 0.78947368 0.82142857 0.76785714 0.83928571 0.82142857 0.75 0.85714286] mean value: 0.8155388471177945 key: train_accuracy value: [0.83037475 0.83037475 0.82051282 0.83234714 0.82677165 0.82874016 0.81692913 0.83858268 0.83070866 0.84055118] mean value: 0.829589293202255 key: test_fscore value: [0.89285714 0.77777778 0.81481481 0.77777778 0.8 0.71111111 0.83636364 0.8 0.74074074 0.84615385] mean value: 0.7997596847596848 key: train_fscore value: [0.81465517 0.81623932 0.80513919 0.81876333 0.81276596 0.81606765 0.79913607 0.82916667 0.81779661 0.825054 ] mean value: 0.8154783953529364 key: test_precision value: [0.89285714 0.80769231 0.88 0.84 0.90909091 0.94117647 0.85185185 0.90909091 0.76923077 0.91666667] mean value: 0.8717657027068791 key: train_precision value: [0.9 0.89252336 0.87850467 0.88888889 0.88425926 0.88127854 0.88516746 0.88053097 0.8853211 0.9138756 ] mean value: 0.8890349860913827 key: test_recall value: [0.89285714 0.75 0.75862069 0.72413793 0.71428571 0.57142857 0.82142857 0.71428571 0.71428571 0.78571429] mean value: 0.7447044334975369 key: train_recall value: [0.74409449 0.7519685 0.743083 0.75889328 0.7519685 0.75984252 0.72834646 0.78346457 0.75984252 0.7519685 ] mean value: 0.7533472347577106 key: test_roc_auc value: [0.89470443 0.7887931 0.82573892 0.79064039 0.82142857 0.76785714 0.83928571 0.82142857 0.75 0.85714286] mean value: 0.8157019704433498 key: train_roc_auc value: [0.83054527 0.83052971 0.8203604 0.83220255 0.82677165 0.82874016 0.81692913 0.83858268 0.83070866 0.84055118] mean value: 0.8295921384332887 key: test_jcc value: [0.80645161 0.63636364 0.6875 0.63636364 0.66666667 0.55172414 0.71875 0.66666667 0.58823529 0.73333333] mean value: 0.6692054984345847 key: train_jcc value: [0.68727273 0.68953069 0.67383513 0.69314079 0.68458781 0.68928571 0.66546763 0.70818505 0.69175627 0.70220588] mean value: 0.6885267694805385 MCC on Blind test: 0.33 Accuracy on Blind test: 0.75 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01014781 0.01013613 0.01016474 0.01117969 0.01014805 0.0102253 0.01012969 0.010077 0.01030874 0.01011229] mean value: 0.01026294231414795 key: score_time value: [0.00869703 0.00873399 0.00877452 0.01073337 0.00870085 0.00876904 0.00877881 0.00878739 0.00882244 0.00872064] mean value: 0.008951807022094726 key: test_mcc value: [0.8953202 0.61805122 0.64901478 0.51048128 0.67900461 0.78772636 0.67900461 0.79385662 0.53881591 0.75047877] mean value: 0.6901754345176191 key: train_mcc value: [0.73570695 0.71249972 0.77122983 0.73596545 0.74805469 0.73622618 0.75197433 0.74052487 0.73718664 0.77225227] mean value: 0.7441620925818874 key: test_accuracy value: [0.94736842 0.80701754 0.8245614 0.75438596 0.83928571 0.89285714 0.83928571 0.89285714 0.76785714 0.875 ] mean value: 0.844047619047619 key: train_accuracy value: [0.8678501 0.85601578 0.88560158 0.8678501 0.87401575 0.86811024 0.87598425 0.87007874 0.86811024 0.88582677] mean value: 0.87194435384926 key: test_fscore value: [0.94736842 0.81355932 0.82758621 0.75 0.84210526 0.88888889 0.83636364 0.88461538 0.77966102 0.87719298] mean value: 0.8447341122414179 key: train_fscore value: [0.8678501 0.85370741 0.88582677 0.86573146 0.8745098 0.8678501 0.87573964 0.87209302 0.86464646 0.88803089] mean value: 0.8715985671472862 key: test_precision value: [0.93103448 0.77419355 0.82758621 0.77777778 0.82758621 0.92307692 0.85185185 0.95833333 0.74193548 0.86206897] mean value: 0.8475444780366916 key: train_precision value: [0.86956522 0.86938776 0.88235294 0.87804878 0.87109375 0.86956522 0.87747036 0.85877863 0.8879668 0.87121212] mean value: 0.8735441569425723 key: test_recall value: [0.96428571 0.85714286 0.82758621 0.72413793 0.85714286 0.85714286 0.82142857 0.82142857 0.82142857 0.89285714] mean value: 0.8444581280788177 key: train_recall value: [0.86614173 0.83858268 0.88932806 0.85375494 0.87795276 0.86614173 0.87401575 0.88582677 0.84251969 0.90551181] mean value: 0.8699775917338396 key: test_roc_auc value: [0.9476601 0.80788177 0.82450739 0.75492611 0.83928571 0.89285714 0.83928571 0.89285714 0.76785714 0.875 ] mean value: 0.8442118226600985 key: train_roc_auc value: [0.86785347 0.85605023 0.88560891 0.86782235 0.87401575 0.86811024 0.87598425 0.87007874 0.86811024 0.88582677] mean value: 0.8719460956708475 key: test_jcc value: [0.9 0.68571429 0.70588235 0.6 0.72727273 0.8 0.71875 0.79310345 0.63888889 0.78125 ] mean value: 0.735086170309294 key: train_jcc value: [0.76655052 0.74475524 0.795053 0.76325088 0.77700348 0.76655052 0.77894737 0.77319588 0.76156584 0.79861111] mean value: 0.7725483853417521 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00978541 0.01127982 0.0107739 0.01075292 0.01079726 0.01070213 0.01073289 0.01096082 0.01066756 0.00969982] mean value: 0.010615253448486328 key: score_time value: [0.01713634 0.01846361 0.01252007 0.01257229 0.01251316 0.01299095 0.01284051 0.01288319 0.01281261 0.01232457] mean value: 0.013705730438232422 key: test_mcc value: [0.69397486 0.36835853 0.33621986 0.54592083 0.39310793 0.4645821 0.50128041 0.49030429 0.53605627 0.5728919 ] mean value: 0.49026969817834043 key: train_mcc value: [0.68061695 0.70442556 0.66993258 0.70071825 0.70157079 0.69368798 0.68114987 0.66560714 0.70499218 0.68640255] mean value: 0.688910385901818 key: test_accuracy value: [0.84210526 0.68421053 0.66666667 0.77192982 0.69642857 0.73214286 0.75 0.73214286 0.76785714 0.78571429] mean value: 0.7429197994987469 key: train_accuracy value: [0.84023669 0.85207101 0.83431953 0.85009862 0.8503937 0.84645669 0.84055118 0.83267717 0.8523622 0.84251969] mean value: 0.844168646818556 key: test_fscore value: [0.82352941 0.66666667 0.65454545 0.78688525 0.69090909 0.72727273 0.74074074 0.68085106 0.76363636 0.77777778] mean value: 0.7312814543044954 key: train_fscore value: [0.8389662 0.8502994 0.82857143 0.84677419 0.84677419 0.84274194 0.83960396 0.83033932 0.8502994 0.83739837] mean value: 0.8411768412067648 key: test_precision value: [0.91304348 0.69230769 0.69230769 0.75 0.7037037 0.74074074 0.76923077 0.84210526 0.77777778 0.80769231] mean value: 0.7688909425179448 key: train_precision value: [0.84738956 0.86234818 0.85654008 0.86419753 0.8677686 0.86363636 0.84462151 0.84210526 0.86234818 0.86554622] mean value: 0.8576501484027818 key: test_recall value: [0.75 0.64285714 0.62068966 0.82758621 0.67857143 0.71428571 0.71428571 0.57142857 0.75 0.75 ] mean value: 0.7019704433497537 key: train_recall value: [0.83070866 0.83858268 0.80237154 0.83003953 0.82677165 0.82283465 0.83464567 0.81889764 0.83858268 0.81102362] mean value: 0.8254458311288164 key: test_roc_auc value: [0.84051724 0.68349754 0.66748768 0.77093596 0.69642857 0.73214286 0.75 0.73214286 0.76785714 0.78571429] mean value: 0.7426724137931034 key: train_roc_auc value: [0.84025552 0.85209766 0.83425664 0.85005913 0.8503937 0.84645669 0.84055118 0.83267717 0.8523622 0.84251969] mean value: 0.8441629578911332 key: test_jcc value: [0.7 0.5 0.48648649 0.64864865 0.52777778 0.57142857 0.58823529 0.51612903 0.61764706 0.63636364] mean value: 0.5792716505904362 key: train_jcc value: [0.72260274 0.73958333 0.70731707 0.73426573 0.73426573 0.728223 0.72354949 0.70989761 0.73958333 0.72027972] mean value: 0.7259567763866404 MCC on Blind test: 0.24 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02472115 0.02486706 0.02309799 0.0221045 0.02213407 0.02268434 0.02683425 0.02616549 0.02615142 0.0264852 ] mean value: 0.024524545669555663 key: score_time value: [0.01321721 0.01203775 0.01212263 0.01310039 0.0120523 0.01226783 0.01339436 0.01332831 0.01342893 0.01346922] mean value: 0.012841892242431641 key: test_mcc value: [0.86189955 0.8953202 0.79110556 0.72064772 0.78571429 0.8660254 0.71611487 0.8660254 0.61065803 0.78571429] mean value: 0.7899225309271085 key: train_mcc value: [0.78700923 0.79097672 0.79093074 0.81067833 0.80324922 0.78361641 0.80709287 0.78361641 0.81889764 0.79134472] mean value: 0.7967412266329543 key: test_accuracy value: [0.92982456 0.94736842 0.89473684 0.85964912 0.89285714 0.92857143 0.85714286 0.92857143 0.80357143 0.89285714] mean value: 0.893515037593985 key: train_accuracy value: [0.89349112 0.89546351 0.89546351 0.90532544 0.9015748 0.89173228 0.90354331 0.89173228 0.90944882 0.89566929] mean value: 0.8983444377145164 key: test_fscore value: [0.93103448 0.94736842 0.9 0.86666667 0.89285714 0.92307692 0.85185185 0.92307692 0.81355932 0.89285714] mean value: 0.89423488762318 key: train_fscore value: [0.89328063 0.8962818 0.8950495 0.90551181 0.90234375 0.89278752 0.90373281 0.89278752 0.90944882 0.89587426] mean value: 0.8987098439098706 key: test_precision value: [0.9 0.93103448 0.87096774 0.83870968 0.89285714 1. 0.88461538 1. 0.77419355 0.89285714] mean value: 0.8985235120830226 key: train_precision value: [0.8968254 0.89105058 0.8968254 0.90196078 0.89534884 0.88416988 0.90196078 0.88416988 0.90944882 0.89411765] mean value: 0.8955878017441364 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.89655172 0.89285714 0.85714286 0.82142857 0.85714286 0.85714286 0.89285714] mean value: 0.8934729064039408 key: train_recall value: [0.88976378 0.9015748 0.89328063 0.90909091 0.90944882 0.9015748 0.90551181 0.9015748 0.90944882 0.8976378 ] mean value: 0.9018906974572842 key: test_roc_auc value: [0.93041872 0.9476601 0.89408867 0.85899015 0.89285714 0.92857143 0.85714286 0.92857143 0.80357143 0.89285714] mean value: 0.893472906403941 key: train_roc_auc value: [0.89349849 0.89545143 0.89545921 0.90533286 0.9015748 0.89173228 0.90354331 0.89173228 0.90944882 0.89566929] mean value: 0.8983442781114811 key: test_jcc value: [0.87096774 0.9 0.81818182 0.76470588 0.80645161 0.85714286 0.74193548 0.85714286 0.68571429 0.80645161] mean value: 0.8108694152147662 key: train_jcc value: [0.80714286 0.81205674 0.81003584 0.82733813 0.82206406 0.80633803 0.82437276 0.80633803 0.83393502 0.8113879 ] mean value: 0.8161009358062393 MCC on Blind test: 0.23 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.05455732 1.96423841 2.07713985 2.01222348 1.87226844 1.90104604 2.04310274 2.02595901 1.96324086 1.9570272 ] mean value: 1.9870803356170654 key: score_time value: [0.01421905 0.01303458 0.01450396 0.01240849 0.01312137 0.01311612 0.01234913 0.01241875 0.01241255 0.01332283] mean value: 0.013090682029724122 key: test_mcc value: [0.79161589 0.75462449 0.79110556 0.75462449 0.78772636 0.85714286 0.71611487 0.8660254 0.64450339 0.78772636] mean value: 0.7751209688559202 key: train_mcc value: [0.98425123 0.98823457 0.99211042 0.98034517 0.99215674 1. 0.98819663 0.98825791 0.99212598 0.99607071] mean value: 0.9901749372582905 key: test_accuracy value: [0.89473684 0.87719298 0.89473684 0.87719298 0.89285714 0.92857143 0.85714286 0.92857143 0.82142857 0.89285714] mean value: 0.8865288220551378 key: train_accuracy value: [0.99211045 0.99408284 0.99605523 0.99013807 0.99606299 1. 0.99409449 0.99409449 0.99606299 0.9980315 ] mean value: 0.9950733044464116 key: test_fscore value: [0.89655172 0.87272727 0.9 0.88135593 0.89655172 0.92857143 0.86206897 0.92307692 0.82758621 0.88888889] mean value: 0.8877379066157558 key: train_fscore value: [0.99215686 0.99412916 0.99604743 0.99017682 0.99604743 1. 0.99410609 0.99405941 0.99606299 0.99802761] mean value: 0.9950813802058787 key: test_precision value: [0.86666667 0.88888889 0.87096774 0.86666667 0.86666667 0.92857143 0.83333333 1. 0.8 0.92307692] mean value: 0.8844838315806058 key: train_precision value: [0.98828125 0.98832685 0.99604743 0.984375 1. 1. 0.99215686 1. 0.99606299 1. ] mean value: 0.9945250383950149 key: test_recall value: [0.92857143 0.85714286 0.93103448 0.89655172 0.92857143 0.92857143 0.89285714 0.85714286 0.85714286 0.85714286] mean value: 0.8934729064039408 key: train_recall value: [0.99606299 1. 0.99604743 0.99604743 0.99212598 1. 0.99606299 0.98818898 0.99606299 0.99606299] mean value: 0.9956661790793937 key: test_roc_auc value: [0.8953202 0.87684729 0.89408867 0.87684729 0.89285714 0.92857143 0.85714286 0.92857143 0.82142857 0.89285714] mean value: 0.8864532019704434 key: train_roc_auc value: [0.99210264 0.99407115 0.99605521 0.9901497 0.99606299 1. 0.99409449 0.99409449 0.99606299 0.9980315 ] mean value: 0.9950725156391024 key: test_jcc value: [0.8125 0.77419355 0.81818182 0.78787879 0.8125 0.86666667 0.75757576 0.85714286 0.70588235 0.8 ] mean value: 0.7992521788774161 key: train_jcc value: [0.9844358 0.98832685 0.99212598 0.98054475 0.99212598 1. 0.98828125 0.98818898 0.99215686 0.99606299] mean value: 0.9902249442749081 MCC on Blind test: 0.23 Accuracy on Blind test: 0.63 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02935839 0.02018619 0.0232408 0.02205086 0.02052236 0.023561 0.02732038 0.02198553 0.02232289 0.02229857] mean value: 0.02328469753265381 key: score_time value: [0.01187396 0.00923061 0.00890207 0.00886345 0.00889254 0.00902057 0.0091393 0.00891209 0.00899506 0.00882149] mean value: 0.00926511287689209 key: test_mcc value: [0.93202124 0.8951918 0.92980296 0.8951918 0.82195294 0.96490128 0.71611487 0.78772636 0.89342711 0.85933785] mean value: 0.8695668223710578 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.94736842 0.96491228 0.94736842 0.91071429 0.98214286 0.85714286 0.89285714 0.94642857 0.92857143] mean value: 0.9342418546365915 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 0.94545455 0.96551724 0.94915254 0.90909091 0.98181818 0.86206897 0.89655172 0.94736842 0.93103448] mean value: 0.9351019976545216 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96296296 0.96551724 0.93333333 0.92592593 1. 0.83333333 0.86666667 0.93103448 0.9 ] mean value: 0.9318773946360154 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92857143 0.92857143 0.96551724 0.96551724 0.89285714 0.96428571 0.89285714 0.92857143 0.96428571 0.96428571] mean value: 0.9395320197044336 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96428571 0.94704433 0.96490148 0.94704433 0.91071429 0.98214286 0.85714286 0.89285714 0.94642857 0.92857143] mean value: 0.9341133004926109 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 0.89655172 0.93333333 0.90322581 0.83333333 0.96428571 0.75757576 0.8125 0.9 0.87096774] mean value: 0.8800344839624595 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.47 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12648606 0.1256876 0.12501502 0.1250596 0.12481833 0.12559652 0.1252358 0.12575173 0.1248858 0.12514901] mean value: 0.1253685474395752 key: score_time value: [0.01796365 0.01816535 0.01793432 0.01795077 0.01797152 0.0180378 0.01830125 0.01778531 0.01789355 0.01786304] mean value: 0.017986655235290527 key: test_mcc value: [0.8953202 0.71921182 0.82490815 0.72064772 0.71428571 0.96490128 0.82195294 0.8660254 0.68250015 0.78571429] mean value: 0.79954676684031 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.85964912 0.9122807 0.85964912 0.85714286 0.98214286 0.91071429 0.92857143 0.83928571 0.89285714] mean value: 0.8989661654135338 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94736842 0.85714286 0.91525424 0.86666667 0.85714286 0.98181818 0.9122807 0.92307692 0.84745763 0.89285714] mean value: 0.9001065615918425 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93103448 0.85714286 0.9 0.83870968 0.85714286 1. 0.89655172 1. 0.80645161 0.89285714] mean value: 0.8979890354361989 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.85714286 0.93103448 0.89655172 0.85714286 0.96428571 0.92857143 0.85714286 0.89285714 0.89285714] mean value: 0.9041871921182266 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9476601 0.85960591 0.91194581 0.85899015 0.85714286 0.98214286 0.91071429 0.92857143 0.83928571 0.89285714] mean value: 0.8988916256157635 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.9 0.75 0.84375 0.76470588 0.75 0.96428571 0.83870968 0.85714286 0.73529412 0.80645161] mean value: 0.8210339861751152 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.31 Accuracy on Blind test: 0.72 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01171422 0.01144671 0.01151061 0.01173711 0.01158261 0.01144481 0.01164675 0.01174784 0.01157475 0.01150846] mean value: 0.011591386795043946 key: score_time value: [0.00953698 0.00944829 0.00955725 0.00954795 0.0095222 0.00948381 0.00958061 0.00948858 0.00951004 0.00953937] mean value: 0.00952150821685791 key: test_mcc value: [0.69397486 0.54592083 0.47413793 0.54759338 0.39310793 0.75434227 0.5 0.5 0.60753044 0.39513166] mean value: 0.5411739308965824 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84210526 0.77192982 0.73684211 0.77192982 0.69642857 0.875 0.75 0.75 0.80357143 0.69642857] mean value: 0.7694235588972431 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 0.75471698 0.73684211 0.76363636 0.70175439 0.86792453 0.75 0.75 0.80701754 0.67924528] mean value: 0.7634666602941619 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91304348 0.8 0.75 0.80769231 0.68965517 0.92 0.75 0.75 0.79310345 0.72 ] mean value: 0.7893494406642833 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.71428571 0.72413793 0.72413793 0.71428571 0.82142857 0.75 0.75 0.82142857 0.64285714] mean value: 0.7412561576354679 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.84051724 0.77093596 0.73706897 0.77278325 0.69642857 0.875 0.75 0.75 0.80357143 0.69642857] mean value: 0.7692733990147783 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 0.60606061 0.58333333 0.61764706 0.54054054 0.76666667 0.6 0.6 0.67647059 0.51428571] mean value: 0.6205004507945684 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.2 Accuracy on Blind test: 0.68 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.0131824 1.91670203 1.87928843 1.87886953 1.83877921 1.85278034 1.86355448 1.92332649 1.94182944 1.91238904] mean value: 1.9020701408386231 key: score_time value: [0.10155416 0.09879255 0.09464073 0.09240246 0.09679675 0.14493012 0.09252048 0.10167122 0.09474611 0.10077143] mean value: 0.10188260078430175 key: test_mcc value: [0.96547546 0.8953202 0.92980296 0.82512315 0.89342711 1. 0.85933785 0.92857143 0.78772636 0.96490128] mean value: 0.9049685794519872 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.94736842 0.96491228 0.9122807 0.94642857 1. 0.92857143 0.96428571 0.89285714 0.98214286] mean value: 0.9521303258145364 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.94736842 0.96551724 0.9122807 0.94545455 1. 0.93103448 0.96428571 0.89655172 0.98181818] mean value: 0.9526129194459503 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93103448 0.96551724 0.92857143 0.96296296 1. 0.9 0.96428571 0.86666667 1. ] mean value: 0.9519038496624703 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.96551724 0.89655172 0.92857143 1. 0.96428571 0.96428571 0.92857143 0.96428571] mean value: 0.954064039408867 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.9476601 0.96490148 0.91256158 0.94642857 1. 0.92857143 0.96428571 0.89285714 0.98214286] mean value: 0.9521551724137932 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.9 0.93333333 0.83870968 0.89655172 1. 0.87096774 0.93103448 0.8125 0.96428571] mean value: 0.9111668388156152 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.6 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.97926283 0.95430398 0.98551869 0.96115994 1.1477139 0.9665091 0.97480011 0.94035792 0.9657414 1.04465652] mean value: 0.9920024394989013 key: score_time value: [0.26140761 0.27186894 0.21786308 0.26194715 0.22197795 0.17087865 0.23330307 0.19104862 0.26959944 0.21822429] mean value: 0.23181188106536865 key: test_mcc value: [0.96547546 0.8953202 0.8953202 0.85960591 0.89342711 1. 0.82195294 0.89342711 0.72168784 0.96490128] mean value: 0.8911118047885251 key: train_mcc value: [0.95269145 0.9605814 0.94872473 0.95661511 0.95278544 0.95278544 0.96062992 0.95278544 0.95670033 0.95278544] mean value: 0.9547084708624481 key: test_accuracy value: [0.98245614 0.94736842 0.94736842 0.92982456 0.94642857 1. 0.91071429 0.94642857 0.85714286 0.98214286] mean value: 0.9449874686716792 key: train_accuracy value: [0.97633136 0.98027613 0.97435897 0.97830375 0.97637795 0.97637795 0.98031496 0.97637795 0.97834646 0.97637795] mean value: 0.9773443445308981 key: test_fscore value: [0.98181818 0.94736842 0.94736842 0.93103448 0.94545455 1. 0.9122807 0.94545455 0.86666667 0.98181818] mean value: 0.9459264147830391 key: train_fscore value: [0.97647059 0.98039216 0.97425743 0.97830375 0.97647059 0.97647059 0.98031496 0.97647059 0.978389 0.97647059] mean value: 0.9774010229981591 key: test_precision value: [1. 0.93103448 0.96428571 0.93103448 0.96296296 1. 0.89655172 0.96296296 0.8125 1. ] mean value: 0.9461332329866813 key: train_precision value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.97265625 0.9765625 0.97619048 0.97637795 0.97265625 0.97265625 0.98031496 0.97265625 0.97647059 0.97265625] mean value: 0.9749197727811597 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.93103448 0.92857143 1. 0.92857143 0.92857143 0.92857143 0.96428571] mean value: 0.9469211822660099 key: train_recall value: [0.98031496 0.98425197 0.97233202 0.98023715 0.98031496 0.98031496 0.98031496 0.98031496 0.98031496 0.98031496] mean value: 0.979902586287386 key: test_roc_auc value: [0.98214286 0.9476601 0.9476601 0.92980296 0.94642857 1. 0.91071429 0.94642857 0.85714286 0.98214286] mean value: 0.945012315270936 key: train_roc_auc value: [0.97632349 0.98026828 0.97435498 0.97830755 0.97637795 0.97637795 0.98031496 0.97637795 0.97834646 0.97637795] mean value: 0.9773427531044786 key: test_jcc value: [0.96428571 0.9 0.9 0.87096774 0.89655172 1. 0.83870968 0.89655172 0.76470588 0.96428571] mean value: 0.8996058178555071 key: train_jcc value: [0.95402299 0.96153846 0.94980695 0.95752896 0.95402299 0.95402299 0.96138996 0.95402299 0.95769231 0.95402299] mean value: 0.9558071580485373 MCC on Blind test: 0.26 Accuracy on Blind test: 0.6 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02486706 0.01062918 0.01203179 0.01114845 0.01061964 0.01092672 0.01033497 0.01076055 0.01072574 0.01054311] mean value: 0.01225872039794922 key: score_time value: [0.01115727 0.00907111 0.00894833 0.00886679 0.00894332 0.00974488 0.00890231 0.00917554 0.0092206 0.00940967] mean value: 0.009343981742858887 key: test_mcc value: [0.8953202 0.61805122 0.64901478 0.51048128 0.67900461 0.78772636 0.67900461 0.79385662 0.53881591 0.75047877] mean value: 0.6901754345176191 key: train_mcc value: [0.73570695 0.71249972 0.77122983 0.73596545 0.74805469 0.73622618 0.75197433 0.74052487 0.73718664 0.77225227] mean value: 0.7441620925818874 key: test_accuracy value: [0.94736842 0.80701754 0.8245614 0.75438596 0.83928571 0.89285714 0.83928571 0.89285714 0.76785714 0.875 ] mean value: 0.844047619047619 key: train_accuracy value: [0.8678501 0.85601578 0.88560158 0.8678501 0.87401575 0.86811024 0.87598425 0.87007874 0.86811024 0.88582677] mean value: 0.87194435384926 key: test_fscore value: [0.94736842 0.81355932 0.82758621 0.75 0.84210526 0.88888889 0.83636364 0.88461538 0.77966102 0.87719298] mean value: 0.8447341122414179 key: train_fscore value: [0.8678501 0.85370741 0.88582677 0.86573146 0.8745098 0.8678501 0.87573964 0.87209302 0.86464646 0.88803089] mean value: 0.8715985671472862 key: test_precision value: [0.93103448 0.77419355 0.82758621 0.77777778 0.82758621 0.92307692 0.85185185 0.95833333 0.74193548 0.86206897] mean value: 0.8475444780366916 key: train_precision value: [0.86956522 0.86938776 0.88235294 0.87804878 0.87109375 0.86956522 0.87747036 0.85877863 0.8879668 0.87121212] mean value: 0.8735441569425723 key: test_recall value: [0.96428571 0.85714286 0.82758621 0.72413793 0.85714286 0.85714286 0.82142857 0.82142857 0.82142857 0.89285714] mean value: 0.8444581280788177 key: train_recall value: [0.86614173 0.83858268 0.88932806 0.85375494 0.87795276 0.86614173 0.87401575 0.88582677 0.84251969 0.90551181] mean value: 0.8699775917338396 key: test_roc_auc value: [0.9476601 0.80788177 0.82450739 0.75492611 0.83928571 0.89285714 0.83928571 0.89285714 0.76785714 0.875 ] mean value: 0.8442118226600985 key: train_roc_auc value: [0.86785347 0.85605023 0.88560891 0.86782235 0.87401575 0.86811024 0.87598425 0.87007874 0.86811024 0.88582677] mean value: 0.8719460956708475 key: test_jcc value: [0.9 0.68571429 0.70588235 0.6 0.72727273 0.8 0.71875 0.79310345 0.63888889 0.78125 ] mean value: 0.735086170309294 key: train_jcc value: [0.76655052 0.74475524 0.795053 0.76325088 0.77700348 0.76655052 0.77894737 0.77319588 0.76156584 0.79861111] mean value: 0.7725483853417521 MCC on Blind test: 0.28 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09275723 0.24454331 0.06994462 0.07444978 0.06929159 0.0760982 0.07451367 0.07683897 0.07969928 0.07621717] mean value: 0.09343538284301758 key: score_time value: [0.01097846 0.01088572 0.01100826 0.01090145 0.01062918 0.01087093 0.0107832 0.01081181 0.01131725 0.01134419] mean value: 0.010953044891357422 key: test_mcc value: [0.96547546 0.92980296 0.92980296 0.92980296 0.85714286 1. 0.89802651 0.89342711 0.85933785 0.96490128] mean value: 0.9227719933358153 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.96491228 0.96491228 0.96491228 0.92857143 1. 0.94642857 0.94642857 0.92857143 0.98214286] mean value: 0.9609335839598997 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.96428571 0.96551724 0.96551724 0.92857143 1. 0.94915254 0.94736842 0.93103448 0.98181818] mean value: 0.9615083435436261 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96428571 0.96551724 0.96551724 0.92857143 1. 0.90322581 0.93103448 0.9 1. ] mean value: 0.9558151914825997 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.96551724 0.96551724 0.92857143 1. 1. 0.96428571 0.96428571 0.96428571] mean value: 0.9681034482758621 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.96490148 0.96490148 0.96490148 0.92857143 1. 0.94642857 0.94642857 0.92857143 0.98214286] mean value: 0.9608990147783252 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.93103448 0.93333333 0.93333333 0.86666667 1. 0.90322581 0.9 0.87096774 0.96428571] mean value: 0.926713279305048 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.34 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04199457 0.06921983 0.0511663 0.08315659 0.06977725 0.07992172 0.07494569 0.07559061 0.07782745 0.07561278] mean value: 0.06992127895355224 key: score_time value: [0.02025461 0.01197267 0.01904845 0.01850629 0.01926947 0.01906586 0.01851892 0.01858687 0.02082109 0.01868105] mean value: 0.0184725284576416 key: test_mcc value: [0.82942474 0.86189955 0.82490815 0.68736396 0.82195294 0.85933785 0.75047877 0.85933785 0.64450339 0.75047877] mean value: 0.7889685968126741 key: train_mcc value: [0.90543486 0.89754406 0.88566582 0.91321465 0.90576456 0.89774912 0.89766562 0.90163769 0.89788834 0.89774912] mean value: 0.9000313853409169 key: test_accuracy value: [0.9122807 0.92982456 0.9122807 0.84210526 0.91071429 0.92857143 0.875 0.92857143 0.82142857 0.875 ] mean value: 0.893577694235589 key: train_accuracy value: [0.95266272 0.94871795 0.94280079 0.9566075 0.95275591 0.9488189 0.9488189 0.9507874 0.9488189 0.9488189 ] mean value: 0.9499607852272903 key: test_fscore value: [0.91525424 0.93103448 0.91525424 0.85245902 0.9122807 0.92592593 0.87272727 0.92592593 0.82758621 0.87272727] mean value: 0.8951175279685669 key: train_fscore value: [0.953125 0.94921875 0.94302554 0.95652174 0.95330739 0.94921875 0.94901961 0.95107632 0.94941634 0.94921875] mean value: 0.9503148193596516 key: test_precision value: [0.87096774 0.9 0.9 0.8125 0.89655172 0.96153846 0.88888889 0.96153846 0.8 0.88888889] mean value: 0.8880874166928115 key: train_precision value: [0.94573643 0.94186047 0.9375 0.95652174 0.94230769 0.94186047 0.9453125 0.94552529 0.93846154 0.94186047] mean value: 0.9436946591185824 key: test_recall value: [0.96428571 0.96428571 0.93103448 0.89655172 0.92857143 0.89285714 0.85714286 0.89285714 0.85714286 0.85714286] mean value: 0.9041871921182266 key: train_recall value: [0.96062992 0.95669291 0.9486166 0.95652174 0.96456693 0.95669291 0.95275591 0.95669291 0.96062992 0.95669291] mean value: 0.957049267062961 key: test_roc_auc value: [0.91317734 0.93041872 0.91194581 0.841133 0.91071429 0.92857143 0.875 0.92857143 0.82142857 0.875 ] mean value: 0.8935960591133005 key: train_roc_auc value: [0.95264698 0.94870219 0.94281224 0.95660733 0.95275591 0.9488189 0.9488189 0.9507874 0.9488189 0.9488189 ] mean value: 0.9499587625657465 key: test_jcc value: [0.84375 0.87096774 0.84375 0.74285714 0.83870968 0.86206897 0.77419355 0.86206897 0.70588235 0.77419355] mean value: 0.8118441942961834 key: train_jcc value: [0.91044776 0.90334572 0.89219331 0.91666667 0.91078067 0.90334572 0.90298507 0.90671642 0.9037037 0.90334572] mean value: 0.9053530776518071 MCC on Blind test: 0.21 Accuracy on Blind test: 0.67 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01435184 0.01032829 0.01071692 0.01100564 0.00968957 0.00969958 0.00994968 0.0099864 0.00970316 0.00968027] mean value: 0.010511136054992676 key: score_time value: [0.01026058 0.00964212 0.00945568 0.00853848 0.00872827 0.00901937 0.00866747 0.00857186 0.0086143 0.00862527] mean value: 0.009012341499328613 key: test_mcc value: [0.82942474 0.79161589 0.78940887 0.61453202 0.68250015 0.8660254 0.64285714 0.79385662 0.5728919 0.75047877] mean value: 0.7333591505072501 key: train_mcc value: [0.75542311 0.7321357 0.73964396 0.72420838 0.75989552 0.74024928 0.75202096 0.76377953 0.73264695 0.77559656] mean value: 0.7475599944964467 key: test_accuracy value: [0.9122807 0.89473684 0.89473684 0.80701754 0.83928571 0.92857143 0.82142857 0.89285714 0.78571429 0.875 ] mean value: 0.8651629072681705 key: train_accuracy value: [0.87771203 0.86587771 0.86982249 0.86193294 0.87992126 0.87007874 0.87598425 0.88188976 0.86614173 0.88779528] mean value: 0.8737156191274907 key: test_fscore value: [0.91525424 0.89655172 0.89655172 0.80701754 0.83018868 0.92307692 0.82142857 0.88461538 0.79310345 0.87719298] mean value: 0.8644981218521811 key: train_fscore value: [0.87795276 0.864 0.86956522 0.85943775 0.88062622 0.86904762 0.87524752 0.88188976 0.864 0.88757396] mean value: 0.8729340819469472 key: test_precision value: [0.87096774 0.86666667 0.89655172 0.82142857 0.88 1. 0.82142857 0.95833333 0.76666667 0.86206897] mean value: 0.8744112241114466 key: train_precision value: [0.87795276 0.87804878 0.86956522 0.87346939 0.87548638 0.876 0.88047809 0.88188976 0.87804878 0.88932806] mean value: 0.8780267218020522 key: test_recall value: [0.96428571 0.92857143 0.89655172 0.79310345 0.78571429 0.85714286 0.82142857 0.82142857 0.82142857 0.89285714] mean value: 0.8582512315270936 key: train_recall value: [0.87795276 0.8503937 0.86956522 0.8458498 0.88582677 0.86220472 0.87007874 0.88188976 0.8503937 0.88582677] mean value: 0.8679981948896704 key: test_roc_auc value: [0.91317734 0.8953202 0.89470443 0.80726601 0.83928571 0.92857143 0.82142857 0.89285714 0.78571429 0.875 ] mean value: 0.865332512315271 key: train_roc_auc value: [0.87771156 0.86590831 0.86982198 0.86190128 0.87992126 0.87007874 0.87598425 0.88188976 0.86614173 0.88779528] mean value: 0.8737154150197628 key: test_jcc value: [0.84375 0.8125 0.8125 0.67647059 0.70967742 0.85714286 0.6969697 0.79310345 0.65714286 0.78125 ] mean value: 0.7640506867121406 key: train_jcc value: [0.78245614 0.76056338 0.76923077 0.75352113 0.78671329 0.76842105 0.77816901 0.78873239 0.76056338 0.79787234] mean value: 0.7746242885126692 MCC on Blind test: 0.3 Accuracy on Blind test: 0.73 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0148561 0.02070785 0.02067232 0.0191505 0.02243805 0.02403593 0.02616668 0.02317834 0.01781106 0.01990294] mean value: 0.02089197635650635 key: score_time value: [0.00977182 0.01106358 0.01167631 0.0116396 0.01164293 0.01165843 0.01170874 0.01168728 0.01164556 0.01165771] mean value: 0.011415195465087891 key: test_mcc value: [0.79161589 0.79161589 0.77728159 0.82490815 0.73127242 0.92857143 0.78772636 0.85714286 0.68250015 0.68250015] mean value: 0.7855134886225912 key: train_mcc value: [0.83859222 0.88998239 0.77868981 0.8606491 0.82347315 0.89001213 0.87673787 0.85625065 0.89454004 0.83839667] mean value: 0.854732402199318 key: test_accuracy value: [0.89473684 0.89473684 0.87719298 0.9122807 0.85714286 0.96428571 0.89285714 0.92857143 0.83928571 0.83928571] mean value: 0.8900375939849624 key: train_accuracy value: [0.91913215 0.94477318 0.8816568 0.92899408 0.90551181 0.94488189 0.93700787 0.92716535 0.94685039 0.91732283] mean value: 0.9253296370498066 key: test_fscore value: [0.89655172 0.89655172 0.89230769 0.91525424 0.84 0.96428571 0.89655172 0.92857143 0.84745763 0.84745763] mean value: 0.8924989499104052 key: train_fscore value: [0.91816367 0.94573643 0.89208633 0.92592593 0.89655172 0.94552529 0.93939394 0.92952381 0.94567404 0.92105263] mean value: 0.9259633804353411 key: test_precision value: [0.86666667 0.86666667 0.80555556 0.9 0.95454545 0.96428571 0.86666667 0.92857143 0.80645161 0.80645161] mean value: 0.8765861378764604 key: train_precision value: [0.93117409 0.93129771 0.81848185 0.96566524 0.99047619 0.93461538 0.90510949 0.900369 0.96707819 0.88129496] mean value: 0.9225562104390707 key: test_recall value: [0.92857143 0.92857143 1. 0.93103448 0.75 0.96428571 0.92857143 0.92857143 0.89285714 0.89285714] mean value: 0.9145320197044335 key: train_recall value: [0.90551181 0.96062992 0.98023715 0.88932806 0.81889764 0.95669291 0.97637795 0.96062992 0.92519685 0.96456693] mean value: 0.9338069154399178 key: test_roc_auc value: [0.8953202 0.8953202 0.875 0.91194581 0.85714286 0.96428571 0.89285714 0.92857143 0.83928571 0.83928571] mean value: 0.8899014778325123 key: train_roc_auc value: [0.91915907 0.94474184 0.88185086 0.928916 0.90551181 0.94488189 0.93700787 0.92716535 0.94685039 0.91732283] mean value: 0.9253407923811895 key: test_jcc value: [0.8125 0.8125 0.80555556 0.84375 0.72413793 0.93103448 0.8125 0.86666667 0.73529412 0.73529412] mean value: 0.8079232871309443 key: train_jcc value: [0.84870849 0.89705882 0.80519481 0.86206897 0.8125 0.89667897 0.88571429 0.8683274 0.89694656 0.85365854] mean value: 0.8626856837436376 MCC on Blind test: 0.29 Accuracy on Blind test: 0.71 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01960063 0.0234971 0.02287674 0.0235045 0.01932335 0.01994562 0.02235985 0.02164769 0.02076983 0.02375412] mean value: 0.021727943420410158 key: score_time value: [0.01173115 0.0117259 0.0115416 0.01154709 0.01157951 0.01162434 0.01164246 0.0116179 0.01165915 0.01167464] mean value: 0.011634373664855957 key: test_mcc value: [0.8951918 0.50752605 0.73477227 0.51004294 0.78772636 0.79385662 0.78571429 0.82618439 0.67900461 0.78772636] mean value: 0.7307745676699011 key: train_mcc value: [0.81207793 0.62496305 0.88020472 0.69213593 0.82633424 0.83976368 0.87159992 0.83520301 0.85017081 0.84267432] mean value: 0.8075127612144095 key: test_accuracy value: [0.94736842 0.71929825 0.85964912 0.73684211 0.89285714 0.89285714 0.89285714 0.91071429 0.83928571 0.89285714] mean value: 0.8584586466165414 key: train_accuracy value: [0.90335306 0.78106509 0.93885602 0.82642998 0.90944882 0.91732283 0.93503937 0.91338583 0.92125984 0.91929134] mean value: 0.8965452173507897 key: test_fscore value: [0.94545455 0.77142857 0.875 0.7826087 0.89655172 0.9 0.89285714 0.91525424 0.83636364 0.89655172] mean value: 0.8712070277320068 key: train_fscore value: [0.89770355 0.82067851 0.94095238 0.85084746 0.91512915 0.92164179 0.93690249 0.91911765 0.91561181 0.92307692] mean value: 0.9041661713849551 key: test_precision value: [0.96296296 0.64285714 0.8 0.675 0.86666667 0.84375 0.89285714 0.87096774 0.85185185 0.86666667] mean value: 0.8273580175797918 key: train_precision value: [0.95555556 0.69589041 0.90808824 0.74480712 0.86111111 0.87588652 0.91078067 0.86206897 0.98636364 0.88172043] mean value: 0.8682272660537491 key: test_recall value: [0.92857143 0.96428571 0.96551724 0.93103448 0.92857143 0.96428571 0.89285714 0.96428571 0.82142857 0.92857143] mean value: 0.9289408866995074 key: train_recall value: [0.84645669 1. 0.97628458 0.99209486 0.97637795 0.97244094 0.96456693 0.98425197 0.85433071 0.96850394] mean value: 0.9535308580498584 key: test_roc_auc value: [0.94704433 0.72352217 0.85775862 0.73337438 0.89285714 0.89285714 0.89285714 0.91071429 0.83928571 0.89285714] mean value: 0.8583128078817734 key: train_roc_auc value: [0.9034655 0.78063241 0.93892969 0.82675609 0.90944882 0.91732283 0.93503937 0.91338583 0.92125984 0.91929134] mean value: 0.8965531729482431 key: test_jcc value: [0.89655172 0.62790698 0.77777778 0.64285714 0.8125 0.81818182 0.80645161 0.84375 0.71875 0.8125 ] mean value: 0.7757227052602081 key: train_jcc value: [0.81439394 0.69589041 0.88848921 0.74041298 0.84353741 0.85467128 0.88129496 0.85034014 0.84435798 0.85714286] mean value: 0.8270531167459525 MCC on Blind test: 0.24 Accuracy on Blind test: 0.56 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.20300555 0.18559599 0.18765807 0.18791032 0.18824935 0.1863656 0.18636107 0.18714428 0.18626761 0.18867087] mean value: 0.18872287273406982 key: score_time value: [0.01514959 0.01522899 0.01563787 0.01534891 0.01529288 0.01523805 0.01523829 0.01545191 0.01530719 0.01528692] mean value: 0.015318059921264648 key: test_mcc value: [0.96547546 0.92980296 0.92980296 0.96547546 0.89342711 0.93094934 0.89802651 0.85714286 0.89342711 0.96490128] mean value: 0.9228431033982084 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.96491228 0.96491228 0.98245614 0.94642857 0.96428571 0.94642857 0.92857143 0.94642857 0.98214286] mean value: 0.9609022556390977 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.96428571 0.96551724 0.98305085 0.94736842 0.96296296 0.94915254 0.92857143 0.94736842 0.98181818] mean value: 0.9611913942771552 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96428571 0.96551724 0.96666667 0.93103448 1. 0.90322581 0.92857143 0.93103448 1. ] mean value: 0.9590335822871974 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.96428571 0.96551724 1. 0.96428571 0.92857143 1. 0.92857143 0.96428571 0.96428571] mean value: 0.9644088669950739 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.96490148 0.96490148 0.98214286 0.94642857 0.96428571 0.94642857 0.92857143 0.94642857 0.98214286] mean value: 0.9608374384236454 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.93103448 0.93333333 0.96666667 0.9 0.92857143 0.90322581 0.86666667 0.9 0.96428571] mean value: 0.9258069813019758 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.39 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06400371 0.07849145 0.07418036 0.06228375 0.07443714 0.07705092 0.08038139 0.06570339 0.08440733 0.08470249] mean value: 0.07456419467926026 key: score_time value: [0.02218103 0.02349353 0.03681755 0.0226469 0.02388501 0.02647114 0.02291846 0.03203106 0.03182936 0.02605414] mean value: 0.02683281898498535 key: test_mcc value: [0.96547546 0.92980296 0.96551724 0.93202124 0.89342711 1. 0.93094934 0.89342711 0.89802651 0.96490128] mean value: 0.937354824745975 key: train_mcc value: [1. 0.99606299 0.97239383 0.99606299 0.98428248 0.99215674 0.98032256 0.98819663 0.99212598 0.99215674] mean value: 0.9893760958242528 key: test_accuracy value: [0.98245614 0.96491228 0.98245614 0.96491228 0.94642857 1. 0.96428571 0.94642857 0.94642857 0.98214286] mean value: 0.9680451127819548 key: train_accuracy value: [1. 0.99802761 0.98619329 0.99802761 0.99212598 0.99606299 0.99015748 0.99409449 0.99606299 0.99606299] mean value: 0.9946815449843918 key: test_fscore value: [0.98181818 0.96428571 0.98245614 0.96666667 0.94736842 1. 0.96551724 0.94736842 0.94915254 0.98181818] mean value: 0.9686451510797076 key: train_fscore value: [1. 0.99802761 0.98613861 0.99802761 0.99215686 0.99607843 0.99017682 0.99408284 0.99606299 0.99607843] mean value: 0.9946830215827512 key: test_precision value: [1. 0.96428571 1. 0.93548387 0.93103448 1. 0.93333333 0.93103448 0.90322581 1. ] mean value: 0.9598397690555643 key: train_precision value: [1. 1. 0.98809524 0.99606299 0.98828125 0.9921875 0.98823529 0.99604743 0.99606299 0.9921875 ] mean value: 0.9937160197294893 key: test_recall value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [0.96428571 0.96428571 0.96551724 1. 0.96428571 1. 1. 0.96428571 1. 0.96428571] mean value: 0.9786945812807882 key: train_recall value: [1. 0.99606299 0.98418972 1. 0.99606299 1. 0.99212598 0.99212598 0.99606299 1. ] mean value: 0.9956630668202048 key: test_roc_auc value: [0.98214286 0.96490148 0.98275862 0.96428571 0.94642857 1. 0.96428571 0.94642857 0.94642857 0.98214286] mean value: 0.9679802955665026 key: train_roc_auc value: [1. 0.9980315 0.98618935 0.9980315 0.99212598 0.99606299 0.99015748 0.99409449 0.99606299 0.99606299] mean value: 0.9946819271108898 key: test_jcc value: [0.96428571 0.93103448 0.96551724 0.93548387 0.9 1. 0.93333333 0.9 0.90322581 0.96428571] mean value: 0.9397166163462047 key: train_jcc value: [1. 0.99606299 0.97265625 0.99606299 0.9844358 0.9921875 0.98054475 0.98823529 0.99215686 0.9921875 ] mean value: 0.9894529935861796 MCC on Blind test: 0.11 Accuracy on Blind test: 0.32 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.16183567 0.13382697 0.17709804 0.17817044 0.19914985 0.18385673 0.18355298 0.17946386 0.18800902 0.17873001] mean value: 0.1763693571090698 key: score_time value: [0.02537632 0.01524782 0.02521944 0.02523208 0.02549791 0.0310111 0.0255177 0.02803326 0.02813077 0.02520919] mean value: 0.025447559356689454 key: test_mcc value: [0.8951918 0.50927421 0.65018988 0.57973205 0.46697379 0.71428571 0.61065803 0.61706091 0.60753044 0.71611487] mean value: 0.6367011683025293 key: train_mcc value: [0.98823511 0.98434388 0.98434291 0.98823457 0.98825791 0.98437404 0.98437404 0.98038334 0.98428248 0.98437404] mean value: 0.9851202321407464 key: test_accuracy value: [0.94736842 0.75438596 0.8245614 0.78947368 0.73214286 0.85714286 0.80357143 0.80357143 0.80357143 0.85714286] mean value: 0.8172932330827067 key: train_accuracy value: [0.99408284 0.99211045 0.99211045 0.99408284 0.99409449 0.99212598 0.99212598 0.99015748 0.99212598 0.99212598] mean value: 0.9925142493283015 key: test_fscore value: [0.94545455 0.74074074 0.83333333 0.8 0.71698113 0.85714286 0.79245283 0.78431373 0.80701754 0.86206897] mean value: 0.8139505673802714 key: train_fscore value: [0.99405941 0.99206349 0.99203187 0.99403579 0.99405941 0.99206349 0.99206349 0.99009901 0.99209486 0.99206349] mean value: 0.9924634309494456 key: test_precision value: [0.96296296 0.76923077 0.80645161 0.77419355 0.76 0.85714286 0.84 0.86956522 0.79310345 0.83333333] mean value: 0.8265983749627411 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 0.99601594 0.99603175 1. ] mean value: 0.9992047682286727 key: test_recall value: [0.92857143 0.71428571 0.86206897 0.82758621 0.67857143 0.85714286 0.75 0.71428571 0.82142857 0.89285714] mean value: 0.804679802955665 key: train_recall value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197 0.98425197 0.98425197 0.98818898 0.98425197] mean value: 0.985815878746382 key: test_roc_auc value: [0.94704433 0.75369458 0.82389163 0.7887931 0.73214286 0.85714286 0.80357143 0.80357143 0.80357143 0.85714286] mean value: 0.8170566502463055 key: train_roc_auc value: [0.99409449 0.99212598 0.99209486 0.99407115 0.99409449 0.99212598 0.99212598 0.99015748 0.99212598 0.99212598] mean value: 0.9925142385857895 key: test_jcc value: [0.89655172 0.58823529 0.71428571 0.66666667 0.55882353 0.75 0.65625 0.64516129 0.67647059 0.75757576] mean value: 0.6910020564753356 key: train_jcc value: [0.98818898 0.98425197 0.98418972 0.98814229 0.98818898 0.98425197 0.98425197 0.98039216 0.98431373 0.98425197] mean value: 0.9850423724934871 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.7932086 0.77255511 0.77291465 0.76873159 0.76692247 0.77482271 0.77445388 0.78742743 0.77633667 0.7769289 ] mean value: 0.7764302015304565 key: score_time value: [0.01081681 0.00929284 0.00945687 0.00935292 0.0094583 0.00916243 0.01019382 0.00930643 0.0095222 0.00936031] mean value: 0.009592294692993164 key: test_mcc value: [0.93202124 0.92980296 0.92980296 0.93202124 0.89342711 1. 0.93094934 0.89342711 0.89802651 0.92857143] mean value: 0.9268049893869115 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.96491228 0.96491228 0.96491228 0.94642857 1. 0.96428571 0.94642857 0.94642857 0.96428571] mean value: 0.9627506265664161 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 0.96428571 0.96551724 0.96666667 0.94736842 1. 0.96551724 0.94736842 0.94915254 0.96428571] mean value: 0.9633124925437824 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96428571 0.96551724 0.93548387 0.93103448 1. 0.93333333 0.93103448 0.90322581 0.96428571] mean value: 0.9528200646220668 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92857143 0.96428571 0.96551724 1. 0.96428571 1. 1. 0.96428571 1. 0.96428571] mean value: 0.9751231527093596 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96428571 0.96490148 0.96490148 0.96428571 0.94642857 1. 0.96428571 0.94642857 0.94642857 0.96428571] mean value: 0.9626231527093597 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 0.93103448 0.93333333 0.93548387 0.9 1. 0.93333333 0.9 0.90322581 0.93103448] mean value: 0.9296016738174692 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.29 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03165007 0.03190446 0.03232837 0.0449326 0.05335832 0.05189848 0.0477066 0.03186512 0.04317474 0.0371747 ] mean value: 0.04059934616088867 key: score_time value: [0.01240468 0.01262307 0.0133605 0.0146513 0.02085543 0.01273656 0.01313639 0.01332498 0.01820731 0.01342344] mean value: 0.014472365379333496 key: test_mcc value: [ 0.10833157 0.24917763 0.06746787 0.23089176 0.09325048 0.10206207 0. -0.06262243 0.26111648 0. ] mean value: 0.10496754415294487 key: train_mcc value: [0.35660528 0.46863058 0.29603453 0.48234536 0.53181602 0.36249568 0.37284508 0.40307741 0.69981145 0.36941213] mean value: 0.43430735340982196 key: test_accuracy value: [0.52631579 0.59649123 0.52631579 0.59649123 0.53571429 0.53571429 0.5 0.48214286 0.60714286 0.5 ] mean value: 0.5406328320802005 key: train_accuracy value: [0.61341223 0.68047337 0.57988166 0.68836292 0.72047244 0.61614173 0.62204724 0.63976378 0.82874016 0.62007874] mean value: 0.6609374272002981 key: test_fscore value: [0.65822785 0.68493151 0.66666667 0.69333333 0.64864865 0.65789474 0.65 0.63291139 0.69444444 0.64102564] mean value: 0.6628084218316483 key: train_fscore value: [0.72159091 0.75820896 0.70375522 0.76204819 0.78153846 0.72261735 0.72571429 0.73516643 0.85378151 0.72467903] mean value: 0.7489100342144692 key: test_precision value: [0.50980392 0.55555556 0.51923077 0.56521739 0.52173913 0.52083333 0.5 0.49019608 0.56818182 0.5 ] mean value: 0.5250757998040607 key: train_precision value: [0.56444444 0.61057692 0.54291845 0.61557178 0.64141414 0.56570156 0.56950673 0.5812357 0.74486804 0.56823266] mean value: 0.6004470420827805 key: test_recall value: [0.92857143 0.89285714 0.93103448 0.89655172 0.85714286 0.89285714 0.92857143 0.89285714 0.89285714 0.89285714] mean value: 0.9006157635467981 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.53325123 0.60160099 0.51908867 0.591133 0.53571429 0.53571429 0.5 0.48214286 0.60714286 0.5 ] mean value: 0.5405788177339901 key: train_roc_auc value: [0.61264822 0.6798419 0.58070866 0.68897638 0.72047244 0.61614173 0.62204724 0.63976378 0.82874016 0.62007874] mean value: 0.6609419252435343 key: test_jcc value: [0.49056604 0.52083333 0.5 0.53061224 0.48 0.49019608 0.48148148 0.46296296 0.53191489 0.47169811] mean value: 0.49602651456675273 key: train_jcc value: [0.56444444 0.61057692 0.54291845 0.61557178 0.64141414 0.56570156 0.56950673 0.5812357 0.74486804 0.56823266] mean value: 0.6004470420827805 MCC on Blind test: -0.06 Accuracy on Blind test: 0.18 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02423191 0.03349376 0.03827691 0.03822851 0.04923773 0.03836203 0.03804398 0.03832388 0.03832483 0.03844643] mean value: 0.03749699592590332 key: score_time value: [0.0185101 0.01860046 0.01860714 0.01879597 0.01897502 0.01841974 0.01845193 0.01842713 0.01836824 0.01837802] mean value: 0.01855337619781494 key: test_mcc value: [0.82942474 0.8953202 0.78940887 0.75808552 0.85714286 0.89802651 0.71611487 0.93094934 0.68250015 0.75047877] mean value: 0.8107451823738316 key: train_mcc value: [0.86590205 0.87777017 0.86194018 0.87387949 0.87062545 0.86638349 0.86638349 0.86638349 0.88616336 0.86624915] mean value: 0.8701680316316706 key: test_accuracy value: [0.9122807 0.94736842 0.89473684 0.87719298 0.92857143 0.94642857 0.85714286 0.96428571 0.83928571 0.875 ] mean value: 0.9042293233082707 key: train_accuracy value: [0.93293886 0.93885602 0.93096647 0.93688363 0.93503937 0.93307087 0.93307087 0.93307087 0.94291339 0.93307087] mean value: 0.9349881190886642 key: test_fscore value: [0.91525424 0.94736842 0.89655172 0.8852459 0.92857143 0.94339623 0.85185185 0.96296296 0.84745763 0.87272727] mean value: 0.9051387653765297 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:195: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_orig.py:198: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.93333333 0.93933464 0.93096647 0.9372549 0.93617021 0.93385214 0.93385214 0.93385214 0.94368932 0.93359375] mean value: 0.935589904607467 key: test_precision value: [0.87096774 0.93103448 0.89655172 0.84375 0.92857143 1. 0.88461538 1. 0.80645161 0.88888889] mean value: 0.9050831263810963 key: train_precision value: [0.9296875 0.93385214 0.92913386 0.92996109 0.92015209 0.92307692 0.92307692 0.92307692 0.93103448 0.92635659] mean value: 0.926940852023113 key: test_recall value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.89285714 0.82142857 0.92857143 0.89285714 0.85714286] mean value: 0.9077586206896552 key: train_recall value: [0.93700787 0.94488189 0.93280632 0.94466403 0.95275591 0.94488189 0.94488189 0.94488189 0.95669291 0.94094488] mean value: 0.9444399489589493 key: test_roc_auc value: [0.91317734 0.9476601 0.89470443 0.87623153 0.92857143 0.94642857 0.85714286 0.96428571 0.83928571 0.875 ] mean value: 0.9042487684729065 key: train_roc_auc value: [0.93293081 0.93884411 0.93097009 0.93689894 0.93503937 0.93307087 0.93307087 0.93307087 0.94291339 0.93307087] mean value: 0.9349880178021226 key: test_jcc value: [0.84375 0.9 0.8125 0.79411765 0.86666667 0.89285714 0.74193548 0.92857143 0.73529412 0.77419355] mean value: 0.8289886035059185 key: train_jcc value: [0.875 0.88560886 0.87084871 0.88191882 0.88 0.87591241 0.87591241 0.87591241 0.89338235 0.87545788] mean value: 0.8789953838440262 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.27105737 0.27678013 0.37362313 0.41940355 0.39782715 0.30619216 0.28300643 0.27874351 0.36507368 0.30322099] mean value: 0.3274928092956543 key: score_time value: [0.0186367 0.01867247 0.01899481 0.0188005 0.01891422 0.01871634 0.01878166 0.01856637 0.01868105 0.01864028] mean value: 0.01874043941497803 key: test_mcc value: [0.82942474 0.8953202 0.78940887 0.79110556 0.85714286 0.89802651 0.78571429 0.93094934 0.68250015 0.75047877] mean value: 0.821007127710734 key: train_mcc value: [0.86590205 0.87777017 0.86194018 0.90927623 0.90576456 0.88987413 0.88588856 0.86638349 0.88616336 0.86624915] mean value: 0.8815211891146428 key: test_accuracy value: [0.9122807 0.94736842 0.89473684 0.89473684 0.92857143 0.94642857 0.89285714 0.96428571 0.83928571 0.875 ] mean value: 0.9095551378446115 key: train_accuracy value: [0.93293886 0.93885602 0.93096647 0.95463511 0.95275591 0.94488189 0.94291339 0.93307087 0.94291339 0.93307087] mean value: 0.9407002748916741 key: test_fscore value: [0.91525424 0.94736842 0.89655172 0.9 0.92857143 0.94339623 0.89285714 0.96296296 0.84745763 0.87272727] mean value: 0.9107147043131244 key: train_fscore value: [0.93333333 0.93933464 0.93096647 0.95445545 0.95330739 0.9453125 0.94324853 0.93385214 0.94368932 0.93359375] mean value: 0.9411093522022579 key: test_precision value: [0.87096774 0.93103448 0.89655172 0.87096774 0.92857143 1. 0.89285714 1. 0.80645161 0.88888889] mean value: 0.9086290763988205 key: train_precision value: [0.9296875 0.93385214 0.92913386 0.95634921 0.94230769 0.9379845 0.93774319 0.92307692 0.93103448 0.92635659] mean value: 0.9347526078770776 key: test_recall value: [0.96428571 0.96428571 0.89655172 0.93103448 0.92857143 0.89285714 0.89285714 0.92857143 0.89285714 0.85714286] mean value: 0.9149014778325123 key: train_recall value: [0.93700787 0.94488189 0.93280632 0.95256917 0.96456693 0.95275591 0.9488189 0.94488189 0.95669291 0.94094488] mean value: 0.9475926675173508 key: test_roc_auc value: [0.91317734 0.9476601 0.89470443 0.89408867 0.92857143 0.94642857 0.89285714 0.96428571 0.83928571 0.875 ] mean value: 0.9096059113300493 key: train_roc_auc value: [0.93293081 0.93884411 0.93097009 0.95463104 0.95275591 0.94488189 0.94291339 0.93307087 0.94291339 0.93307087] mean value: 0.9406982353490398 key: test_jcc value: [0.84375 0.9 0.8125 0.81818182 0.86666667 0.89285714 0.80645161 0.92857143 0.73529412 0.77419355] mean value: 0.8378466335214438 key: train_jcc value: [0.875 0.88560886 0.87084871 0.91287879 0.91078067 0.8962963 0.89259259 0.87591241 0.89338235 0.87545788] mean value: 0.888875854764648 MCC on Blind test: 0.25 Accuracy on Blind test: 0.7