/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_sl.py:548: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 858 PASS: my_features_df and aa_df successfully combined nrows: 858 ncols: 269 count of NULL values before imputation or_mychisq 244 log10_or_mychisq 244 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 168 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 175 ------------------------------------------------------------- Successfully split data with stratification according to scaling law [COMPLETE data]: 1/sqrt(x_ncols) Input features data size: (858, 175) Train data size: (793, 175) Test data size: (65, 175) y_train numbers: Counter({0: 682, 1: 111}) y_train ratio: 6.1441441441441444 y_test_numbers: Counter({0: 56, 1: 9}) y_test ratio: 6.222222222222222 ------------------------------------------------------------- index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: False Original Data Counter({0: 682, 1: 111}) Data dim: (793, 175) Simple Random OverSampling Counter({0: 682, 1: 682}) (1364, 175) Simple Random UnderSampling Counter({0: 111, 1: 111}) (222, 175) Simple Combined Over and UnderSampling Counter({0: 682, 1: 682}) (1364, 175) SMOTE_NC OverSampling Counter({0: 682, 1: 682}) (1364, 175) ##################################################################### Running ML analysis [COMPLETE DATA]: 70/30 split Gene name: embB Drug name: ethambutol Output directory: /home/tanu/git/Data/ethambutol/output/ml/tts_cd_sl/ Sanity checks: Total input features: 175 Training data size: (793, 175) Test data size: (65, 175) Target feature numbers (training data): Counter({0: 682, 1: 111}) Target features ratio (training data: 6.1441441441441444 Target feature numbers (test data): Counter({0: 56, 1: 9}) Target features ratio (test data): 6.222222222222222 ##################################################################### ================================================================ Strucutral features (n): 36 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0474534 0.0887804 0.17873287 0.09015036 0.12496829 0.15901494 0.1283195 0.17443967 0.16761804 0.1949439 ] mean value: 0.13544213771820068 key: score_time value: [0.01321054 0.02090883 0.03018737 0.02814436 0.02515435 0.03415728 0.04635882 0.01815367 0.02486181 0.0253098 ] mean value: 0.026644682884216307 key: test_mcc value: [0.49436016 0.50761192 0.67783439 0.66135521 0.71339159 0.64658323 0.54627358 0.57419245 0.29875024 0.57478846] mean value: 0.5695141241239622 key: train_mcc value: [0.69573947 0.72894477 0.67798071 0.68752657 0.67386786 0.67504963 0.71550727 0.6866705 0.69479833 0.70118527] mean value: 0.693727039663616 key: test_accuracy value: [0.9 0.8875 0.925 0.92405063 0.93670886 0.92405063 0.89873418 0.91139241 0.86075949 0.91139241] mean value: 0.9079588607594937 key: train_accuracy value: [0.93267882 0.93969144 0.92987377 0.93137255 0.92857143 0.92857143 0.93697479 0.93137255 0.93277311 0.93417367] mean value: 0.9326053563080211 key: test_fscore value: [0.42857143 0.57142857 0.66666667 0.7 0.73684211 0.66666667 0.6 0.53333333 0.35294118 0.58823529] mean value: 0.584468524251806 key: train_fscore value: [0.72093023 0.74853801 0.70238095 0.71005917 0.69822485 0.70175439 0.73684211 0.70658683 0.71764706 0.72189349] mean value: 0.7164857087826803 key: test_precision value: [1. 0.6 1. 0.77777778 0.875 0.85714286 0.66666667 1. 0.5 0.83333333] mean value: 0.8109920634920635 key: train_precision value: [0.86111111 0.90140845 0.85507246 0.86956522 0.85507246 0.84507042 0.88732394 0.88059701 0.87142857 0.88405797] mean value: 0.8710707630308493 key: test_recall value: [0.27272727 0.54545455 0.5 0.63636364 0.63636364 0.54545455 0.54545455 0.36363636 0.27272727 0.45454545] mean value: 0.47727272727272724 key: train_recall value: [0.62 0.64 0.5959596 0.6 0.59 0.6 0.63 0.59 0.61 0.61 ] mean value: 0.6085959595959596 key: test_roc_auc value: [0.63636364 0.74374177 0.75 0.80347594 0.81082888 0.76537433 0.75066845 0.68181818 0.61430481 0.71991979] mean value: 0.7276495776176083 key: train_roc_auc value: [0.80184339 0.81429038 0.78983648 0.79267101 0.78685668 0.79104235 0.80848534 0.78848534 0.79767101 0.79848534] mean value: 0.7969667312260502 key: test_jcc value: [0.27272727 0.4 0.5 0.53846154 0.58333333 0.5 0.42857143 0.36363636 0.21428571 0.41666667] mean value: 0.42176823176823175 key: train_jcc value: [0.56363636 0.59813084 0.5412844 0.55045872 0.53636364 0.54054054 0.58333333 0.5462963 0.55963303 0.56481481] mean value: 0.5584491972895471 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [3.60542393 3.47140622 2.3955617 2.10153127 2.12939835 2.20545292 2.10316896 2.01559663 2.27213907 2.13030434] mean value: 2.4429983377456663 key: score_time value: [0.05521178 0.02824688 0.03234005 0.02881169 0.03445387 0.03774905 0.03767657 0.04927754 0.03254128 0.04663587] mean value: 0.03829445838928223 key: test_mcc value: [0.57458119 0.57839262 0.73674318 0.71339159 0.66135521 0.71339159 0.50667099 0.57419245 0.47192513 0.57478846] mean value: 0.6105432419982048 key: train_mcc value: [0.83609113 0.78340423 0.86005148 0.75742384 0.74441319 0.74441319 0.79014378 0.7642695 0.74989358 0.74361667] mean value: 0.777372058017057 key: test_accuracy value: [0.9125 0.9 0.9375 0.93670886 0.92405063 0.93670886 0.88607595 0.91139241 0.87341772 0.91139241] mean value: 0.9129746835443038 key: train_accuracy value: [0.96213184 0.95091164 0.96774194 0.94537815 0.94257703 0.94257703 0.95238095 0.94677871 0.94397759 0.94257703] mean value: 0.949703191234418 key: test_fscore value: [0.53333333 0.63636364 0.76190476 0.73684211 0.7 0.73684211 0.57142857 0.53333333 0.54545455 0.58823529] mean value: 0.6343737686462144 key: train_fscore value: [0.85405405 0.80225989 0.87567568 0.77966102 0.76836158 0.76836158 0.80898876 0.78651685 0.77011494 0.76571429] mean value: 0.7979708643746889 key: test_precision value: [1. 0.63636364 0.88888889 0.875 0.77777778 0.875 0.6 1. 0.54545455 0.83333333] mean value: 0.8031818181818182 key: train_precision value: [0.92941176 0.92207792 0.94186047 0.8961039 0.88311688 0.88311688 0.92307692 0.8974359 0.90540541 0.89333333] mean value: 0.9074939373489305 key: test_recall value: [0.36363636 0.63636364 0.66666667 0.63636364 0.63636364 0.63636364 0.54545455 0.36363636 0.54545455 0.45454545] mean value: 0.5484848484848485 key: train_recall value: [0.79 0.71 0.81818182 0.69 0.68 0.68 0.72 0.7 0.67 0.67 ] mean value: 0.7128181818181818 key: test_roc_auc value: [0.68181818 0.78919631 0.82598039 0.81082888 0.80347594 0.81082888 0.74331551 0.68181818 0.73596257 0.71991979] mean value: 0.7603144617530807 key: train_roc_auc value: [0.89010604 0.85010604 0.90501925 0.83848534 0.83267101 0.83267101 0.85511401 0.84348534 0.82929967 0.82848534] mean value: 0.8505443046015629 key: test_jcc value: [0.36363636 0.46666667 0.61538462 0.58333333 0.53846154 0.58333333 0.4 0.36363636 0.375 0.41666667] mean value: 0.47061188811188814 key: train_jcc value: [0.74528302 0.66981132 0.77884615 0.63888889 0.62385321 0.62385321 0.67924528 0.64814815 0.62616822 0.62037037] mean value: 0.6654467830212485 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01778674 0.01836705 0.01631165 0.0161674 0.01639915 0.01633239 0.01626897 0.01639223 0.01605344 0.01620436] mean value: 0.016628336906433106 key: score_time value: [0.01443315 0.01439118 0.01305008 0.01270366 0.01318622 0.01309729 0.01328421 0.01310253 0.01316381 0.01318336] mean value: 0.013359546661376953 key: test_mcc value: [0.48480731 0.30776281 0.60784314 0.48361682 0.67911951 0.42011668 0.32322935 0.41317454 0.42011668 0.51178719] mean value: 0.4651574048278671 key: train_mcc value: [0.5430384 0.562911 0.56735299 0.52682013 0.55150264 0.55027254 0.55027254 0.53938961 0.53136366 0.54058816] mean value: 0.5463511669174994 key: test_accuracy value: [0.8625 0.75 0.9 0.86075949 0.89873418 0.79746835 0.79746835 0.84810127 0.79746835 0.87341772] mean value: 0.8385917721518987 key: train_accuracy value: [0.8569425 0.86956522 0.87096774 0.87114846 0.87254902 0.86554622 0.86554622 0.85994398 0.8627451 0.8557423 ] mean value: 0.8650696744335883 key: test_fscore value: [0.56 0.41176471 0.66666667 0.56 0.71428571 0.5 0.42857143 0.5 0.5 0.58333333] mean value: 0.5424621848739496 key: train_fscore value: [0.60465116 0.62348178 0.62601626 0.59649123 0.61603376 0.61290323 0.61290323 0.6031746 0.59836066 0.6023166 ] mean value: 0.6096332500516068 key: test_precision value: [0.5 0.30434783 0.66666667 0.5 0.58823529 0.38095238 0.35294118 0.46153846 0.38095238 0.53846154] mean value: 0.46740957252466203 key: train_precision value: [0.49367089 0.52380952 0.52380952 0.53125 0.53284672 0.51351351 0.51351351 0.5 0.50694444 0.49056604] mean value: 0.5129924158230784 key: test_recall value: [0.63636364 0.63636364 0.66666667 0.63636364 0.90909091 0.72727273 0.54545455 0.54545455 0.72727273 0.63636364] mean value: 0.6666666666666666 key: train_recall value: [0.78 0.77 0.77777778 0.68 0.73 0.76 0.76 0.76 0.73 0.78 ] mean value: 0.7527777777777778 key: test_roc_auc value: [0.76745718 0.70223979 0.80392157 0.76671123 0.90307487 0.76804813 0.69184492 0.72125668 0.76804813 0.77406417] mean value: 0.7666666666666666 key: train_roc_auc value: [0.82474715 0.82790375 0.83188563 0.79114007 0.81288274 0.82136808 0.82136808 0.81811075 0.80718241 0.82403909] mean value: 0.8180627733998379 key: test_jcc value: [0.38888889 0.25925926 0.5 0.38888889 0.55555556 0.33333333 0.27272727 0.33333333 0.33333333 0.41176471] mean value: 0.37770845712022183 key: train_jcc value: [0.43333333 0.45294118 0.4556213 0.425 0.44512195 0.44186047 0.44186047 0.43181818 0.42690058 0.43093923] mean value: 0.43853966861639804 MCC on Blind test: 0.41 Accuracy on Blind test: 0.82 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.03735423 0.041116 0.06425524 0.04450727 0.05348682 0.03800988 0.05037355 0.03638315 0.0449841 0.01800466] mean value: 0.042847490310668944 key: score_time value: [0.02528548 0.02139068 0.0293529 0.02545118 0.02264762 0.02740812 0.02261806 0.0254271 0.02451134 0.01420069] mean value: 0.023829317092895506 key: test_mcc value: [-0.10309229 0.22988544 0.38549554 0.06681376 0.71280758 0.16073112 0.38925288 0.24065419 0.00325735 0.43676935] mean value: 0.25225749117040186 key: train_mcc value: [0.34313123 0.31282773 0.31079126 0.36258434 0.28090475 0.33865482 0.38265799 0.36908667 0.32684012 0.30718987] mean value: 0.33346687688314813 key: test_accuracy value: [0.8 0.8375 0.875 0.78481013 0.93670886 0.83544304 0.87341772 0.86075949 0.79746835 0.88607595] mean value: 0.8487183544303798 key: train_accuracy value: [0.86535764 0.86115007 0.86115007 0.8697479 0.85294118 0.86414566 0.86834734 0.86554622 0.85854342 0.8557423 ] mean value: 0.8622671789613461 key: test_fscore value: [0. 0.31578947 0.375 0.19047619 0.70588235 0.23529412 0.44444444 0.26666667 0.11111111 0.47058824] mean value: 0.3115252592264976 key: train_fscore /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) value: [0.4 0.36942675 0.36942675 0.41509434 0.34782609 0.39751553 0.44705882 0.43529412 0.39520958 0.37575758] mean value: 0.3952609555486557 key: test_precision value: [0. 0.375 0.75 0.2 1. 0.33333333 0.57142857 0.5 0.14285714 0.66666667] mean value: 0.4539285714285714 key: train_precision value: [0.53333333 0.50877193 0.5 0.55932203 0.45901639 0.52459016 0.54285714 0.52857143 0.49253731 0.47692308] mean value: 0.5125922816217733 key: test_recall value: [0. 0.27272727 0.25 0.18181818 0.54545455 0.18181818 0.36363636 0.18181818 0.09090909 0.36363636] mean value: 0.24318181818181817 key: train_recall value: [0.32 0.29 0.29292929 0.33 0.28 0.32 0.38 0.37 0.33 0.31 ] mean value: 0.3222929292929293 key: test_roc_auc value: [0.46376812 0.60013175 0.61764706 0.53208556 0.77272727 0.56149733 0.65975936 0.57620321 0.5013369 0.6671123 ] mean value: 0.5952268852204914 key: train_roc_auc value: [0.6371615 0.6221615 0.62284901 0.64382736 0.61312704 0.63638436 0.66394137 0.65812704 0.6373127 0.6273127 ] mean value: 0.6362204586206717 key: test_jcc value: [0. 0.1875 0.23076923 0.10526316 0.54545455 0.13333333 0.28571429 0.15384615 0.05882353 0.30769231] mean value: 0.20083965441163584 key: train_jcc value: [0.25 0.2265625 0.2265625 0.26190476 0.21052632 0.24806202 0.28787879 0.27819549 0.24626866 0.23134328] mean value: 0.24673043100972114 MCC on Blind test: 0.03 Accuracy on Blind test: 0.8 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.04181695 0.0169158 0.02974534 0.01720977 0.01437426 0.03081775 0.03508019 0.01550221 0.03039002 0.02925754] mean value: 0.02611098289489746 key: score_time value: [0.12297058 0.04593039 0.05504537 0.06402946 0.04561687 0.09929895 0.05882621 0.05893588 0.04028583 0.04817009] mean value: 0.06391096115112305 key: test_mcc value: [ 0.11224603 0.16855623 0.46987149 0.28152101 0.16794369 0.16794369 0.11138831 0. -0.09288407 0.28152101] mean value: 0.16681073908647326 key: train_mcc value: [0.44887065 0.40655068 0.38581163 0.42142248 0.41706698 0.43847856 0.4590301 0.34027251 0.39495657 0.39552265] mean value: 0.41079827922829903 key: test_accuracy value: [0.85 0.8625 0.8875 0.87341772 0.86075949 0.86075949 0.84810127 0.86075949 0.81012658 0.87341772] mean value: 0.8587341772151899 key: train_accuracy value: [0.89200561 0.88639551 0.88499299 0.88795518 0.88795518 0.8907563 0.89355742 0.87815126 0.88515406 0.88515406] mean value: 0.887207758278627 key: test_fscore value: [0.14285714 0.15384615 0.4 0.16666667 0.15384615 0.15384615 0.14285714 0. 0. 0.16666667] mean value: 0.14805860805860807 key: train_fscore value: [0.39370079 0.36220472 0.32786885 0.40298507 0.36507937 0.4 0.40625 0.304 0.33870968 0.34920635] mean value: 0.3650004830601975 key: test_precision value: [0.33333333 0.5 1. 1. 0.5 0.5 0.33333333 0. 0. 1. ] mean value: 0.5166666666666666 key: train_precision value: [0.92592593 0.85185185 0.86956522 0.79411765 0.88461538 0.86666667 0.92857143 0.76 0.875 0.84615385] mean value: 0.8602467968235231 key: test_recall value: [0.09090909 0.09090909 0.25 0.09090909 0.09090909 0.09090909 0.09090909 0. 0. 0.09090909] mean value: 0.08863636363636364 key: train_recall value: [0.25 0.23 0.2020202 0.27 0.23 0.26 0.26 0.19 0.21 0.22 ] mean value: 0.23220202020202022 key: test_roc_auc value: [0.53096179 0.53820817 0.625 0.54545455 0.5381016 0.5381016 0.53074866 0.5 0.47058824 0.54545455] mean value: 0.5362619158335271 key: train_roc_auc value: [0.62336868 0.61173736 0.5985671 0.62929967 0.612557 0.62674267 0.62837134 0.59011401 0.602557 0.60674267] mean value: 0.6130057504977346 key: test_jcc value: [0.07692308 0.08333333 0.25 0.09090909 0.08333333 0.08333333 0.07692308 0. 0. 0.09090909] mean value: 0.08356643356643356 key: train_jcc value: [0.24509804 0.22115385 0.19607843 0.25233645 0.22330097 0.25 0.25490196 0.17924528 0.2038835 0.21153846] mean value: 0.2237536936701273 MCC on Blind test: -0.09 Accuracy on Blind test: 0.82 Model_name: SVM Model func: SVC(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.03786564 0.03764153 0.07014346 0.03748584 0.03807282 0.03875828 0.02647471 0.02575278 0.02461553 0.02594876] mean value: 0.036275935173034665 key: score_time value: [0.01906037 0.01942945 0.01911497 0.01885724 0.01908636 0.03045058 0.01306677 0.01287198 0.01263452 0.01324749] mean value: 0.017781972885131836 key: test_mcc value: [0.28178291 0.34676496 0.26782449 0. 0. 0.49398293 0.16794369 0.28152101 0.11138831 0. ] mean value: 0.1951208310227963 key: train_mcc value: [0.47977517 0.47977517 0.3499354 0.50061187 0.40746174 0.39619069 0.4700762 0.41846368 0.52519566 0.47982099] mean value: 0.45073065750677455 key: test_accuracy value: [0.875 0.875 0.8625 0.86075949 0.86075949 0.89873418 0.86075949 0.87341772 0.84810127 0.86075949] mean value: 0.8675791139240506 key: train_accuracy value: [0.89621318 0.89621318 0.88078541 0.89915966 0.88655462 0.88515406 0.89495798 0.88795518 0.90336134 0.89635854] mean value: 0.8926713181766395 key: test_fscore value: [0.16666667 0.375 0.15384615 0. 0. 0.42857143 0.15384615 0.16666667 0.14285714 0. ] mean value: 0.15874542124542124 key: train_fscore value: [0.421875 0.421875 0.26086957 0.4375 0.33057851 0.31666667 0.40944882 0.3442623 0.48888889 0.421875 ] mean value: 0.38538397471492464 key: test_precision value: [1. 0.6 1. 0. 0. 1. 0.5 1. 0.33333333 0. ] mean value: 0.5433333333333333 key: train_precision value: [0.96428571 0.96428571 0.9375 1. 0.95238095 0.95 0.96296296 0.95454545 0.94285714 0.96428571] mean value: 0.9593103655603655 key: test_recall value: [0.09090909 0.27272727 0.08333333 0. 0. 0.27272727 0.09090909 0.09090909 0.09090909 0. ] mean value: 0.09924242424242424 key: train_recall value: [0.27 0.27 0.15151515 0.28 0.2 0.19 0.26 0.21 0.33 0.27 ] mean value: 0.24315151515151517 key: test_roc_auc value: [0.54545455 0.62187088 0.54166667 0.5 0.5 0.63636364 0.5381016 0.54545455 0.53074866 0.5 ] mean value: 0.5459660544059521 key: train_roc_auc value: [0.63418434 0.63418434 0.57494324 0.64 0.59918567 0.59418567 0.62918567 0.60418567 0.66337134 0.63418567] mean value: 0.620761159640681 key: test_jcc value: [0.09090909 0.23076923 0.08333333 0. 0. 0.27272727 0.08333333 0.09090909 0.07692308 0. ] mean value: 0.09289044289044289 key: train_jcc value: [0.26732673 0.26732673 0.15 0.28 0.1980198 0.18811881 0.25742574 0.20792079 0.32352941 0.26732673] mean value: 0.24069947582993595 MCC on Blind test: 0.39 Accuracy on Blind test: 0.88 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [4.76756167 5.27475548 5.63623118 5.40502548 6.04161572 5.8417573 6.6231823 6.51457071 5.34702921 3.33198285] mean value: 5.47837119102478 key: score_time value: [0.01546693 0.0225172 0.01796293 0.02164364 0.04560542 0.0318253 0.03422785 0.02604556 0.01566219 0.02625656] mean value: 0.025721359252929687 key: test_mcc value: [0.43754361 0.51286858 0.61325296 0.36631016 0.66135521 0.59219173 0.43119194 0.43676935 0.54627358 0.71339159] mean value: 0.5311148709191367 key: train_mcc value: [0.95884344 0.98833851 0.97045246 0.98250043 0.97661349 0.97071011 0.98837134 0.95893173 0.95893173 0.97073494] mean value: 0.9724428173839278 key: test_accuracy value: [0.8875 0.875 0.9125 0.84810127 0.92405063 0.91139241 0.87341772 0.88607595 0.89873418 0.93670886] mean value: 0.8953481012658228 key: train_accuracy value: [0.99018233 0.99719495 0.99298738 0.99579832 0.99439776 0.9929972 0.99719888 0.99019608 0.99019608 0.9929972 ] mean value: 0.9934146168986528 key: test_fscore value: [0.47058824 0.58333333 0.63157895 0.45454545 0.7 0.63157895 0.5 0.47058824 0.6 0.73684211] mean value: 0.5779055258467023 key: train_fscore value: [0.96410256 0.98989899 0.97435897 0.98492462 0.97979798 0.97461929 0.99 0.96446701 0.96446701 0.97435897] mean value: 0.9760995405125446 key: test_precision value: [0.66666667 0.53846154 0.85714286 0.45454545 0.77777778 0.75 0.55555556 0.66666667 0.66666667 0.875 ] mean value: 0.6808483183483184 key: train_precision value: [0.98947368 1. 0.98958333 0.98989899 0.98979592 0.98969072 0.99 0.97938144 0.97938144 1. ] mean value: 0.9897205534057619 key: test_recall value: [0.36363636 0.63636364 0.5 0.45454545 0.63636364 0.54545455 0.45454545 0.36363636 0.54545455 0.63636364] mean value: 0.5136363636363637 key: train_recall value: [0.94 0.98 0.95959596 0.98 0.97 0.96 0.99 0.95 0.95 0.95 ] mean value: 0.9629595959595959 key: test_roc_auc value: [0.66732543 0.77470356 0.74264706 0.68315508 0.80347594 0.75802139 0.69786096 0.6671123 0.75066845 0.81082888] mean value: 0.7355799038983183 key: train_roc_auc value: [0.96918434 0.99 0.97898365 0.98918567 0.98418567 0.97918567 0.99418567 0.97337134 0.97337134 0.975 ] mean value: 0.9806653328884811 key: test_jcc value: [0.30769231 0.41176471 0.46153846 0.29411765 0.53846154 0.46153846 0.33333333 0.30769231 0.42857143 0.58333333] mean value: 0.41280435251023484 key: train_jcc value: [0.93069307 0.98 0.95 0.97029703 0.96039604 0.95049505 0.98019802 0.93137255 0.93137255 0.95 ] mean value: 0.9534824305960008 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04919791 0.04214954 0.03464055 0.04058719 0.03853202 0.03963971 0.0449214 0.04226875 0.04017758 0.04114819] mean value: 0.041326284408569336 key: score_time value: [0.01094723 0.00959659 0.00930953 0.00933218 0.01026773 0.00946903 0.00933957 0.0098052 0.00962543 0.00986981] mean value: 0.009756231307983398 key: test_mcc value: [0.51864618 0.55216696 0.64550223 0.57937053 0.57937053 0.77643684 0.31611031 0.7090125 0.43119194 0.48361682] mean value: 0.5591424848620753 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9 0.875 0.9125 0.88607595 0.88607595 0.94936709 0.84810127 0.92405063 0.87341772 0.86075949] mean value: 0.8915348101265823 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.55555556 0.61538462 0.69565217 0.64 0.64 0.8 0.4 0.75 0.5 0.56 ] mean value: 0.6156592344853214 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.71428571 0.53333333 0.72727273 0.57142857 0.57142857 0.88888889 0.44444444 0.69230769 0.55555556 0.5 ] mean value: 0.6198945498945498 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.45454545 0.72727273 0.66666667 0.72727273 0.72727273 0.72727273 0.36363636 0.81818182 0.45454545 0.63636364] mean value: 0.6303030303030304 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71277997 0.81291173 0.81127451 0.81951872 0.81951872 0.85628342 0.64505348 0.87967914 0.69786096 0.76671123] mean value: 0.7821591877857863 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.38461538 0.44444444 0.53333333 0.47058824 0.47058824 0.66666667 0.25 0.6 0.33333333 0.38888889] mean value: 0.4542458521870286 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.94 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.17240572 0.16647243 0.16662502 0.16827416 0.15932035 0.15240288 0.14529085 0.15759325 0.16292405 0.15607858] mean value: 0.16073873043060302 key: score_time value: [0.02106047 0.02116942 0.02069497 0.02151155 0.02009916 0.02284741 0.01929522 0.02032804 0.02048707 0.02063346] mean value: 0.020812678337097167 key: test_mcc value: [0.49436016 0.22988544 0.54611868 0.30268562 0.30268562 0.40742332 0.43676935 0.40742332 0.07388506 0.57419245] mean value: 0.37754290201872565 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9 0.8375 0.9 0.87341772 0.87341772 0.88607595 0.88607595 0.88607595 0.83544304 0.91139241] mean value: 0.8789398734177215 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.42857143 0.31578947 0.5 0.28571429 0.28571429 0.4 0.47058824 0.4 0.13333333 0.53333333] mean value: 0.3753044375644995 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.375 1. 0.66666667 0.66666667 0.75 0.66666667 0.75 0.25 1. ] mean value: 0.7125 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.27272727 0.27272727 0.33333333 0.18181818 0.18181818 0.27272727 0.36363636 0.27272727 0.09090909 0.36363636] mean value: 0.2606060606060606 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63636364 0.60013175 0.66666667 0.58355615 0.58355615 0.6290107 0.6671123 0.6290107 0.52339572 0.68181818] mean value: 0.6200621948384097 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.27272727 0.1875 0.33333333 0.16666667 0.16666667 0.25 0.30769231 0.25 0.07142857 0.36363636] mean value: 0.2369651182151182 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.44 Accuracy on Blind test: 0.88 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01228523 0.01270628 0.01270533 0.01234698 0.01202917 0.01263547 0.01246166 0.01153898 0.01213789 0.01210713] mean value: 0.012295413017272949 key: score_time value: [0.00941229 0.0091939 0.00986099 0.00935078 0.00958705 0.00989294 0.00903296 0.00943208 0.00941062 0.00903535] mean value: 0.00942089557647705 key: test_mcc value: [0.17835014 0.23888665 0.18280094 0.28674237 0.21594923 0.36631016 0.23726791 0.36631016 0.27141973 0.08594704] mean value: 0.2429984334695301 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.8125 0.8125 0.83544304 0.79746835 0.84810127 0.81012658 0.84810127 0.79746835 0.79746835] mean value: 0.8171677215189873 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.28571429 0.34782609 0.28571429 0.38095238 0.33333333 0.45454545 0.34782609 0.45454545 0.38461538 0.2 ] mean value: 0.3475072753333623 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.3 0.33333333 0.33333333 0.4 0.30769231 0.45454545 0.33333333 0.45454545 0.33333333 0.22222222] mean value: 0.3472338772338772 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.27272727 0.36363636 0.25 0.36363636 0.36363636 0.45454545 0.36363636 0.45454545 0.45454545 0.18181818] mean value: 0.3522727272727273 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.585639 0.62384717 0.58088235 0.63770053 0.61564171 0.68315508 0.62299465 0.68315508 0.65374332 0.5394385 ] mean value: 0.6226197395954429 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.16666667 0.21052632 0.16666667 0.23529412 0.2 0.29411765 0.21052632 0.29411765 0.23809524 0.11111111] mean value: 0.21271217258833358 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.82 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.46597672 2.42714763 2.48374224 2.4591248 2.46944237 2.50179291 2.38671589 2.47136688 2.56136847 2.70485163] mean value: 2.493152952194214 key: score_time value: [0.10462785 0.09992313 0.09720373 0.10394239 0.10339022 0.10226011 0.10427213 0.0963974 0.10097837 0.09991646] mean value: 0.10129117965698242 key: test_mcc value: [0.40104758 0.51864618 0.73714245 0.49398293 0.40742332 0.64628973 0.59219173 0.64628973 0.34595509 0.64658323] mean value: 0.5435551986075632 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8875 0.9 0.9375 0.89873418 0.88607595 0.92405063 0.91139241 0.92405063 0.87341772 0.92405063] mean value: 0.9066772151898734 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.30769231 0.55555556 0.73684211 0.42857143 0.4 0.625 0.63157895 0.625 0.375 0.66666667] mean value: 0.5351907011117537 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.71428571 1. 1. 0.75 1. 0.75 1. 0.6 0.85714286] mean value: 0.8671428571428571 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.18181818 0.45454545 0.58333333 0.27272727 0.27272727 0.45454545 0.54545455 0.45454545 0.27272727 0.54545455] mean value: 0.40378787878787875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.59090909 0.71277997 0.79166667 0.63636364 0.6290107 0.72727273 0.75802139 0.72727273 0.62165775 0.76537433] mean value: 0.6960328993257382 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.18181818 0.38461538 0.58333333 0.27272727 0.25 0.45454545 0.46153846 0.45454545 0.23076923 0.5 ] mean value: 0.3773892773892774 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.53 Accuracy on Blind test: 0.89 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [2.23748446 1.20426846 1.20782495 1.06972313 1.0778296 1.18399477 1.38125014 1.35519075 1.36815524 1.3413291 ] mean value: 1.3427050590515137 key: score_time value: [0.2090075 0.18635893 0.19955277 0.20219612 0.1430521 0.22746968 0.17598581 0.18938136 0.17117977 0.23886013] mean value: 0.19430441856384278 key: test_mcc value: [0.28178291 0.34676496 0.6146363 0.40070776 0.40070776 0.64628973 0.43676935 0.49398293 0.11138831 0.49612241] mean value: 0.42291524210944176 key: train_mcc value: [0.80953731 0.81042317 0.79443166 0.78313752 0.78313752 0.77644144 0.77644144 0.8164158 0.78313752 0.78979757] mean value: 0.7922900961329172 key: test_accuracy value: [0.875 0.875 0.9125 0.88607595 0.88607595 0.92405063 0.88607595 0.89873418 0.84810127 0.89873418] mean value: 0.8890348101265823 key: train_accuracy value: [0.95652174 0.95652174 0.95371669 0.95098039 0.95098039 0.94957983 0.94957983 0.95798319 0.95098039 0.95238095] mean value: 0.9529225154297343 key: test_fscore value: [0.16666667 0.375 0.58823529 0.30769231 0.30769231 0.625 0.47058824 0.42857143 0.14285714 0.5 ] mean value: 0.3912303382891618 key: train_fscore value: [0.82080925 0.81656805 0.80473373 0.79289941 0.79289941 0.78571429 0.78571429 0.8255814 0.79289941 0.8 ] mean value: 0.8017819215332322 key: test_precision value: [1. 0.6 1. 1. 1. 1. 0.66666667 1. 0.33333333 0.8 ] mean value: 0.84 key: train_precision value: [0.97260274 1. 0.97142857 0.97101449 0.97101449 0.97058824 0.97058824 0.98611111 0.97101449 0.97142857] mean value: 0.9755790942543386 key: test_recall value: [0.09090909 0.27272727 0.41666667 0.18181818 0.18181818 0.45454545 0.36363636 0.27272727 0.09090909 0.36363636] mean value: 0.2689393939393939 key: train_recall value: [0.71 0.69 0.68686869 0.67 0.67 0.66 0.66 0.71 0.67 0.68 ] mean value: 0.6806868686868687 key: test_roc_auc value: [0.54545455 0.62187088 0.70833333 0.59090909 0.59090909 0.72727273 0.6671123 0.63636364 0.53074866 0.67446524] mean value: 0.6293439510191429 key: train_roc_auc value: [0.85336868 0.845 0.84180568 0.83337134 0.83337134 0.82837134 0.82837134 0.85418567 0.83337134 0.83837134] mean value: 0.8389588038350678 key: test_jcc value: [0.09090909 0.23076923 0.41666667 0.18181818 0.18181818 0.45454545 0.30769231 0.27272727 0.07692308 0.33333333] mean value: 0.2547202797202797 key: train_jcc value: [0.69607843 0.69 0.67326733 0.65686275 0.65686275 0.64705882 0.64705882 0.7029703 0.65686275 0.66666667] mean value: 0.6693688604154533 MCC on Blind test: 0.49 Accuracy on Blind test: 0.89 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0197196 0.03598475 0.03622699 0.03120279 0.01471877 0.03621674 0.02270555 0.03164887 0.03560638 0.04625273] mean value: 0.031028318405151366 key: score_time value: [0.01119256 0.02131534 0.09772825 0.01156521 0.011374 0.04016137 0.01889348 0.02915859 0.02790117 0.02027225] mean value: 0.028956222534179687 key: test_mcc value: [-0.10309229 0.22988544 0.38549554 0.06681376 0.71280758 0.16073112 0.38925288 0.24065419 0.00325735 0.43676935] mean value: 0.25225749117040186 key: train_mcc value: [0.34313123 0.31282773 0.31079126 0.36258434 0.28090475 0.33865482 0.38265799 0.36908667 0.32684012 0.30718987] mean value: 0.33346687688314813 key: test_accuracy value: [0.8 0.8375 0.875 0.78481013 0.93670886 0.83544304 0.87341772 0.86075949 0.79746835 0.88607595] mean value: 0.8487183544303798 key: train_accuracy value: [0.86535764 0.86115007 0.86115007 0.8697479 0.85294118 0.86414566 0.86834734 0.86554622 0.85854342 0.8557423 ] mean value: 0.8622671789613461 key: test_fscore value: [0. 0.31578947 0.375 0.19047619 0.70588235 0.23529412 0.44444444 0.26666667 0.11111111 0.47058824] mean value: 0.3115252592264976 key: train_fscore value: [0.4 0.36942675 0.36942675 0.41509434 0.34782609 0.39751553 0.44705882 0.43529412 0.39520958 0.37575758] mean value: 0.3952609555486557 key: test_precision value: [0. 0.375 0.75 0.2 1. 0.33333333 0.57142857 0.5 0.14285714 0.66666667] mean value: 0.4539285714285714 key: train_precision value: [0.53333333 0.50877193 0.5 0.55932203 0.45901639 0.52459016 0.54285714 0.52857143 0.49253731 0.47692308] mean value: 0.5125922816217733 key: test_recall value: [0. 0.27272727 0.25 0.18181818 0.54545455 0.18181818 0.36363636 0.18181818 0.09090909 0.36363636] mean value: 0.24318181818181817 key: train_recall value: [0.32 0.29 0.29292929 0.33 0.28 0.32 0.38 0.37 0.33 0.31 ] mean value: 0.3222929292929293 key: test_roc_auc value: [0.46376812 0.60013175 0.61764706 0.53208556 0.77272727 0.56149733 0.65975936 0.57620321 0.5013369 0.6671123 ] mean value: 0.5952268852204914 key: train_roc_auc value: [0.6371615 0.6221615 0.62284901 0.64382736 0.61312704 0.63638436 0.66394137 0.65812704 0.6373127 0.6273127 ] mean value: 0.6362204586206717 key: test_jcc value: [0. 0.1875 0.23076923 0.10526316 0.54545455 0.13333333 0.28571429 0.15384615 0.05882353 0.30769231] mean value: 0.20083965441163584 key: train_jcc value: [0.25 0.2265625 0.2265625 0.26190476 0.21052632 0.24806202 0.28787879 0.27819549 0.24626866 0.23134328] mean value: 0.24673043100972114 MCC on Blind test: 0.03 Accuracy on Blind test: 0.8 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [1.20054126 3.52299976 2.68894696 2.35090446 0.53907275 7.61027193 2.27476096 2.3721981 2.27101231 2.30523801] mean value: 2.713594651222229 key: score_time value: [0.01278949 0.02615476 0.01358294 0.01904464 0.01230979 0.01299477 0.01300144 0.0128026 0.01257706 0.01270843] mean value: 0.014796590805053711 key: test_mcc value: [0.64666979 0.57839262 0.74715612 0.77643684 0.72659961 0.83459145 0.68315508 0.78877005 0.54287929 0.77643684] mean value: 0.7101087702417325 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.925 0.9 0.9375 0.94936709 0.93670886 0.96202532 0.92405063 0.94936709 0.88607595 0.94936709] mean value: 0.9319462025316456 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.625 0.63636364 0.7826087 0.8 0.76190476 0.84210526 0.72727273 0.81818182 0.60869565 0.8 ] mean value: 0.7402132554706925 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.63636364 0.81818182 0.88888889 0.8 1. 0.72727273 0.81818182 0.58333333 0.88888889] mean value: 0.8161111111111111 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.45454545 0.63636364 0.75 0.72727273 0.72727273 0.72727273 0.72727273 0.81818182 0.63636364 0.72727273] mean value: 0.6931818181818182 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72727273 0.78919631 0.86029412 0.85628342 0.84893048 0.86363636 0.84157754 0.89438503 0.78141711 0.85628342] mean value: 0.8319276524839185 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.45454545 0.46666667 0.64285714 0.66666667 0.61538462 0.72727273 0.57142857 0.69230769 0.4375 0.66666667] mean value: 0.5941296203796204 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.82 Accuracy on Blind test: 0.95 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.17741966 0.09170365 0.13291836 0.10037684 0.14957428 0.16964102 0.12517238 0.10618401 0.09255886 0.09366941] mean value: 0.12392184734344483 key: score_time value: [0.03746223 0.0251298 0.03144145 0.04300594 0.02535176 0.02620196 0.02602458 0.03189564 0.01941037 0.03404975] mean value: 0.02999734878540039 key: test_mcc value: [0.49671738 0.44219444 0.74715612 0.61039985 0.52515049 0.64432685 0.50667099 0.51791806 0.47192513 0.54627358] mean value: 0.5508732880524055 key: train_mcc value: [0.79768654 0.79768654 0.78545027 0.79773102 0.77082394 0.76944991 0.79638654 0.81288529 0.81288529 0.79249782] mean value: 0.7933483163681401 key: test_accuracy value: [0.9 0.8625 0.9375 0.89873418 0.86075949 0.89873418 0.88607595 0.89873418 0.87341772 0.89873418] mean value: 0.8915189873417722 key: train_accuracy value: [0.95231417 0.95231417 0.94950912 0.95238095 0.94677871 0.94677871 0.95238095 0.95658263 0.95658263 0.95098039] mean value: 0.9516602433399727 key: test_fscore value: [0.5 0.52173913 0.7826087 0.66666667 0.59259259 0.69230769 0.57142857 0.55555556 0.54545455 0.6 ] mean value: 0.6028353450092581 key: train_fscore value: [0.82474227 0.82474227 0.81443299 0.82474227 0.8 0.79787234 0.82291667 0.83597884 0.83597884 0.82051282] mean value: 0.8201919293377125 key: test_precision value: [0.8 0.5 0.81818182 0.61538462 0.5 0.6 0.6 0.71428571 0.54545455 0.66666667] mean value: 0.635997335997336 key: train_precision value: [0.85106383 0.85106383 0.83157895 0.85106383 0.84444444 0.85227273 0.85869565 0.88764045 0.88764045 0.84210526] mean value: 0.8557569422655507 key: test_recall value: [0.36363636 0.54545455 0.75 0.72727273 0.72727273 0.81818182 0.54545455 0.45454545 0.54545455 0.54545455] mean value: 0.6022727272727273 key: train_recall value: [0.8 0.8 0.7979798 0.8 0.76 0.75 0.79 0.79 0.79 0.8 ] mean value: 0.7877979797979798 key: test_roc_auc value: [0.67457181 0.72924901 0.86029412 0.82687166 0.80481283 0.86497326 0.74331551 0.71256684 0.73596257 0.75066845] mean value: 0.7703286057506007 key: train_roc_auc value: [0.88858075 0.88858075 0.88596058 0.88859935 0.86859935 0.86441368 0.88441368 0.88685668 0.88685668 0.88778502] mean value: 0.8830646513812075 key: test_jcc value: [0.33333333 0.35294118 0.64285714 0.5 0.42105263 0.52941176 0.4 0.38461538 0.375 0.42857143] mean value: 0.43677828621327075 key: train_jcc value: [0.70175439 0.70175439 0.68695652 0.70175439 0.66666667 0.66371681 0.69911504 0.71818182 0.71818182 0.69565217] mean value: 0.6953734014984293 MCC on Blind test: 0.53 Accuracy on Blind test: 0.88 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.05694079 0.06395292 0.07268739 0.05766678 0.03244686 0.039891 0.04338455 0.04891658 0.05110669 0.06775498] mean value: 0.05347485542297363 key: score_time value: [0.02860403 0.03798938 0.03443909 0.03022146 0.05056453 0.05091286 0.02191162 0.01679182 0.03358102 0.03731632] mean value: 0.0342332124710083 key: test_mcc value: [0.3033031 0.47299078 0.54492569 0.2605877 0.64658323 0.59219173 0.59219173 0.40742332 0.34979201 0.51791806] mean value: 0.46879073368999474 key: train_mcc value: [0.52799211 0.54819448 0.49772617 0.50359962 0.47984635 0.50675531 0.49820394 0.50147904 0.54318658 0.49630465] mean value: 0.5103288250765816 key: test_accuracy value: [0.875 0.875 0.9 0.84810127 0.92405063 0.91139241 0.91139241 0.88607595 0.86075949 0.89873418] mean value: 0.8890506329113924 key: train_accuracy value: [0.89621318 0.90322581 0.89621318 0.89495798 0.89215686 0.89495798 0.89355742 0.89355742 0.90336134 0.89215686] mean value: 0.8960358056265985 key: test_fscore value: [0.28571429 0.54545455 0.55555556 0.33333333 0.66666667 0.63157895 0.63157895 0.4 0.42105263 0.55555556] mean value: 0.5026490468595731 key: train_fscore value: [0.57954545 0.58682635 0.53164557 0.54545455 0.51572327 0.5508982 0.54216867 0.54761905 0.57668712 0.5443787 ] mean value: 0.5520946928065821 key: test_precision value: [0.66666667 0.54545455 0.83333333 0.42857143 0.85714286 0.75 0.75 0.75 0.5 0.71428571] mean value: 0.6795454545454546 key: train_precision value: [0.67105263 0.73134328 0.71186441 0.69230769 0.69491525 0.68656716 0.68181818 0.67647059 0.74603175 0.66666667] mean value: 0.6959037615416671 key: test_recall value: [0.18181818 0.54545455 0.41666667 0.27272727 0.54545455 0.54545455 0.54545455 0.27272727 0.36363636 0.45454545] mean value: 0.41439393939393937 key: train_recall value: [0.51 0.49 0.42424242 0.45 0.41 0.46 0.45 0.46 0.47 0.46 ] mean value: 0.4584242424242424 key: test_roc_auc value: [0.58366271 0.73649539 0.70098039 0.60695187 0.76537433 0.75802139 0.75802139 0.6290107 0.65240642 0.71256684] mean value: 0.6903491436100132 key: train_roc_auc value: [0.73460848 0.73031811 0.69827756 0.70871336 0.69034202 0.71289902 0.70789902 0.71208469 0.72197068 0.71127036] mean value: 0.7128383307545542 key: test_jcc value: [0.16666667 0.375 0.38461538 0.2 0.5 0.46153846 0.46153846 0.25 0.26666667 0.38461538] mean value: 0.3450641025641026 key: train_jcc value: [0.408 0.41525424 0.36206897 0.375 0.34745763 0.38016529 0.37190083 0.37704918 0.40517241 0.37398374] mean value: 0.3816052279584871 MCC on Blind test: 0.39 Accuracy on Blind test: 0.86 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03733659 0.11061597 0.06538081 0.06376076 0.06266046 0.06085181 0.07293844 0.06197333 0.06072569 0.04202867] mean value: 0.0638272523880005 key: score_time value: [0.01605201 0.05469155 0.03833151 0.080966 0.02012324 0.04428625 0.04422903 0.04278708 0.03952503 0.04887033] mean value: 0.04298620223999024 key: test_mcc value: [0.45953084 0.28498089 0.72784016 0.49612241 0.54709854 0.6166353 0.41220189 0.40070776 0.2605877 0.30268562] mean value: 0.4508391113755412 key: train_mcc value: [0.62022622 0.3410488 0.65083284 0.6562802 0.58642176 0.6405191 0.66211276 0.69245266 0.66378948 0.56951461] mean value: 0.6083198435919134 key: test_accuracy value: [0.825 0.475 0.925 0.89873418 0.82278481 0.91139241 0.82278481 0.88607595 0.84810127 0.87341772] mean value: 0.8288291139240507 key: train_accuracy value: [0.86255259 0.56521739 0.88920056 0.92577031 0.82913165 0.92296919 0.89915966 0.93277311 0.92717087 0.91036415] mean value: 0.8664309482558802 key: test_fscore value: [0.53333333 0.34375 0.76923077 0.5 0.58823529 0.66666667 0.5 0.30769231 0.33333333 0.28571429] mean value: 0.4827955990088343 key: train_fscore value: [0.65492958 0.38976378 0.69019608 0.67080745 0.61392405 0.64968153 0.70491803 0.7037037 0.67901235 0.53623188] mean value: 0.6293168434362774 key: test_precision value: [0.42105263 0.20754717 0.71428571 0.8 0.43478261 0.7 0.41176471 1. 0.42857143 0.66666667] mean value: 0.5784670925492083 key: train_precision value: [0.50543478 0.24264706 0.56410256 0.8852459 0.44907407 0.89473684 0.59722222 0.91935484 0.88709677 0.97368421] mean value: 0.6918599269005234 key: test_recall value: [0.72727273 1. 0.83333333 0.36363636 0.90909091 0.63636364 0.63636364 0.18181818 0.27272727 0.18181818] mean value: 0.5742424242424242 key: train_recall value: [0.93 0.99 0.88888889 0.54 0.97 0.51 0.86 0.57 0.55 0.37 ] mean value: 0.7178888888888889 key: test_roc_auc value: [0.78392622 0.69565217 0.8872549 0.67446524 0.85895722 0.79612299 0.74465241 0.59090909 0.60695187 0.58355615] mean value: 0.7222448267844688 key: train_roc_auc value: [0.89077488 0.74296085 0.88906985 0.76429967 0.88809446 0.75011401 0.88276873 0.78092834 0.76929967 0.68418567] mean value: 0.8042496131294506 key: test_jcc value: [0.36363636 0.20754717 0.625 0.33333333 0.41666667 0.5 0.33333333 0.18181818 0.2 0.16666667] mean value: 0.33280017152658664 key: train_jcc value: [0.48691099 0.24205379 0.52694611 0.5046729 0.44292237 0.48113208 0.5443038 0.54285714 0.51401869 0.36633663] mean value: 0.46521545049547125 MCC on Blind test: 0.53 Accuracy on Blind test: 0.89 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0385313 0.06978273 0.05149913 0.08231616 0.07145095 0.06117988 0.04736829 0.06787324 0.03546977 0.06965303] mean value: 0.05951244831085205 key: score_time value: [0.02941966 0.02366066 0.01590729 0.03571057 0.03261065 0.04180789 0.02399945 0.01923251 0.01207161 0.04127169] mean value: 0.02756919860839844 key: test_mcc value: [0.61126063 0.43754361 0.74715612 0.28152101 0.49398293 0.51178719 0.49398293 0.40070776 0.57754011 0.57478846] mean value: 0.5130270755632105 key: train_mcc value: [0.72265053 0.54542212 0.74972636 0.45004776 0.49882429 0.79137057 0.39848852 0.5438872 0.74247904 0.78344513] mean value: 0.6226341527355859 key: test_accuracy value: [0.9 0.8875 0.9375 0.87341772 0.89873418 0.87341772 0.89873418 0.88607595 0.89873418 0.91139241] mean value: 0.8965506329113924 key: train_accuracy value: [0.914446 0.90603086 0.9312763 0.89215686 0.89915966 0.94677871 0.88515406 0.90616246 0.94257703 0.95098039] mean value: 0.9174722343355295 key: test_fscore value: [0.66666667 0.47058824 0.7826087 0.16666667 0.42857143 0.58333333 0.42857143 0.30769231 0.63636364 0.58823529] mean value: 0.5059297692929406 key: train_fscore value: [0.75303644 0.4962406 0.78222222 0.384 0.44615385 0.82075472 0.30508475 0.5037037 0.76023392 0.80225989] mean value: 0.6053690078708643 key: test_precision value: [0.61538462 0.66666667 0.81818182 1. 1. 0.53846154 1. 1. 0.63636364 0.83333333] mean value: 0.8108391608391609 key: train_precision value: [0.63265306 1. 0.6984127 0.96 0.96666667 0.77678571 1. 0.97142857 0.91549296 0.92207792] mean value: 0.8843517591842541 key: test_recall value: [0.72727273 0.36363636 0.75 0.09090909 0.27272727 0.63636364 0.27272727 0.18181818 0.63636364 0.45454545] mean value: 0.43863636363636366 key: train_recall value: [0.93 0.33 0.88888889 0.24 0.29 0.87 0.18 0.34 0.65 0.71 ] mean value: 0.5428888888888889 key: test_roc_auc value: [0.82740448 0.66732543 0.86029412 0.54545455 0.63636364 0.77406417 0.63636364 0.59090909 0.78877005 0.71991979] mean value: 0.7046868945206541 key: train_roc_auc value: [0.92095432 0.665 0.91349982 0.61918567 0.64418567 0.91464169 0.59 0.66918567 0.82011401 0.85011401] mean value: 0.7606880852136629 key: test_jcc value: [0.5 0.30769231 0.64285714 0.09090909 0.27272727 0.41176471 0.27272727 0.18181818 0.46666667 0.41666667] mean value: 0.3563829307946955 key: train_jcc value: [0.6038961 0.33 0.64233577 0.23762376 0.28712871 0.696 0.18 0.33663366 0.61320755 0.66981132] mean value: 0.45966368768578514 MCC on Blind test: 0.5 Accuracy on Blind test: 0.86 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.45761418 0.44829988 0.63949275 0.64037538 0.63990831 0.52243495 0.23520517 0.24666142 0.24852872 0.23637342] mean value: 0.4314894199371338 key: score_time value: [0.04816031 0.04698753 0.04737043 0.02180266 0.04970551 0.01631689 0.01759291 0.01567197 0.01605535 0.0212636 ] mean value: 0.030092716217041016 key: test_mcc value: [0.49436016 0.50761192 0.68803296 0.50667099 0.78877005 0.83459145 0.57754011 0.64658323 0.74662021 0.6166353 ] mean value: 0.6407416395771031 key: train_mcc value: [0.91707162 0.92292711 0.91673807 0.91746997 0.91095361 0.92294382 0.93493402 0.91673282 0.91673282 0.87397233] mean value: 0.9150476183677066 key: test_accuracy value: [0.9 0.8875 0.925 0.88607595 0.94936709 0.96202532 0.89873418 0.92405063 0.93670886 0.91139241] mean value: 0.9180854430379747 key: train_accuracy value: [0.98036466 0.98176718 0.98036466 0.98039216 0.9789916 0.98179272 0.98459384 0.98039216 0.98039216 0.97058824] mean value: 0.9799639350831496 key: test_fscore value: [0.42857143 0.57142857 0.72727273 0.57142857 0.81818182 0.84210526 0.63636364 0.66666667 0.7826087 0.66666667] mean value: 0.6711294045390155 key: train_fscore value: [0.92783505 0.93264249 0.92783505 0.92857143 0.92227979 0.93264249 0.94300518 0.92631579 0.92631579 0.88888889] mean value: 0.9256331947686998 key: test_precision value: [1. 0.6 0.8 0.6 0.81818182 1. 0.63636364 0.85714286 0.75 0.7 ] mean value: 0.7761688311688312 key: train_precision value: [0.95744681 0.96774194 0.94736842 0.94791667 0.95698925 0.96774194 0.97849462 0.97777778 0.97777778 0.94382022] mean value: 0.9623075418440077 key: test_recall value: [0.27272727 0.54545455 0.66666667 0.54545455 0.81818182 0.72727273 0.63636364 0.54545455 0.81818182 0.63636364] mean value: 0.6212121212121212 key: train_recall value: [0.9 0.9 0.90909091 0.91 0.89 0.9 0.91 0.88 0.88 0.84 ] mean value: 0.8919090909090909 key: test_roc_auc value: [0.63636364 0.74374177 0.81862745 0.74331551 0.89438503 0.86363636 0.78877005 0.76537433 0.88703209 0.79612299] mean value: 0.7937369216461289 key: train_roc_auc value: [0.94673736 0.94755302 0.95047379 0.95092834 0.94174267 0.947557 0.95337134 0.93837134 0.93837134 0.91592834] mean value: 0.9431034526817773 key: test_jcc value: [0.27272727 0.4 0.57142857 0.4 0.69230769 0.72727273 0.46666667 0.5 0.64285714 0.5 ] mean value: 0.5173260073260073 key: train_jcc value: [0.86538462 0.87378641 0.86538462 0.86666667 0.85576923 0.87378641 0.89215686 0.8627451 0.8627451 0.8 ] mean value: 0.8618425002562639 MCC on Blind test: 0.61 Accuracy on Blind test: 0.91 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.20900869 0.20622039 0.2010932 0.21744394 0.25689149 0.24539351 0.24098468 0.23988628 0.22156787 0.23667002] mean value: 0.2275160074234009 key: score_time value: [0.04157877 0.03804398 0.04044437 0.02461791 0.0289681 0.02714062 0.02853703 0.02809358 0.02449107 0.02955127] mean value: 0.031146669387817384 key: test_mcc value: [0.51864618 0.66195674 0.79388419 0.72659961 0.84849067 0.77643684 0.36631016 0.72659961 0.47099187 0.66135521] mean value: 0.6551271071223412 key: train_mcc value: [0.95890563 0.96478211 0.97047687 0.95891444 0.95891444 0.96483326 0.95295992 0.9647899 0.97073494 0.97661989] mean value: 0.9641931415166104 key: test_accuracy value: [0.9 0.925 0.95 0.93670886 0.96202532 0.94936709 0.84810127 0.93670886 0.88607595 0.92405063] mean value: 0.9218037974683544 key: train_accuracy value: [0.99018233 0.99158485 0.99298738 0.99019608 0.99019608 0.99159664 0.98879552 0.99159664 0.9929972 0.99439776] mean value: 0.9914530468568914 key: test_fscore value: [0.55555556 0.7 0.81818182 0.76190476 0.86956522 0.8 0.45454545 0.76190476 0.52631579 0.7 ] mean value: 0.6947973358957341 key: train_fscore value: [0.96373057 0.96938776 0.97409326 0.96373057 0.96373057 0.96907216 0.95918367 0.96938776 0.97435897 0.97959184] mean value: 0.9686267133808856 key: test_precision value: [0.71428571 0.77777778 0.9 0.8 0.83333333 0.88888889 0.45454545 0.8 0.625 0.77777778] mean value: 0.7571608946608946 key: train_precision value: [1. 0.98958333 1. 1. 1. 1. 0.97916667 0.98958333 1. 1. ] mean value: 0.9958333333333333 key: test_recall value: [0.45454545 0.63636364 0.75 0.72727273 0.90909091 0.72727273 0.45454545 0.72727273 0.45454545 0.63636364] mean value: 0.6477272727272727 key: train_recall value: [0.93 0.95 0.94949495 0.93 0.93 0.94 0.94 0.95 0.95 0.96 ] mean value: 0.942949494949495 key: test_roc_auc value: [0.71277997 0.80368906 0.86764706 0.84893048 0.93983957 0.85628342 0.68315508 0.84893048 0.7052139 0.80347594] mean value: 0.8069944974037045 key: train_roc_auc value: [0.965 0.97418434 0.97474747 0.965 0.965 0.97 0.96837134 0.97418567 0.975 0.98 ] mean value: 0.9711488817319649 key: test_jcc value: [0.38461538 0.53846154 0.69230769 0.61538462 0.76923077 0.66666667 0.29411765 0.61538462 0.35714286 0.53846154] mean value: 0.5471773324714502 key: train_jcc value: [0.93 0.94059406 0.94949495 0.93 0.93 0.94 0.92156863 0.94059406 0.95 0.96 ] mean value: 0.9392251695757811 MCC on Blind test: 0.69 Accuracy on Blind test: 0.92 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.61395884 0.5485301 0.56743336 0.51388955 0.55802751 0.559268 0.47061753 0.56458807 0.535954 0.55411768] mean value: 0.5486384630203247 key: score_time value: [0.04558992 0.04537392 0.04560876 0.05530596 0.0599947 0.04393911 0.04824662 0.04597187 0.0459497 0.04629207] mean value: 0.04822726249694824 key: test_mcc value: [ 0.40104758 0.11224603 0. 0. 0.28152101 0.16794369 -0.079909 0.28152101 -0.04554016 0.40070776] mean value: 0.1519537908453421 key: train_mcc value: [0.81042317 0.81042317 0.76854967 0.81045494 0.78418497 0.77753061 0.79738754 0.77083951 0.81694021 0.79080362] mean value: 0.7937537410006497 key: test_accuracy value: [0.8875 0.85 0.85 0.86075949 0.87341772 0.86075949 0.82278481 0.87341772 0.84810127 0.88607595] mean value: 0.8612816455696203 key: train_accuracy value: [0.95652174 0.95652174 0.94810659 0.95658263 0.95098039 0.94957983 0.95378151 0.94817927 0.95798319 0.95238095] mean value: 0.9530617857241073 key: test_fscore value: [0.30769231 0.14285714 0. 0. 0.16666667 0.15384615 0. 0.16666667 0. 0.30769231] mean value: 0.12454212454212456 key: train_fscore value: [0.81656805 0.81656805 0.77018634 0.81656805 0.78787879 0.7804878 0.80239521 0.77300613 0.82352941 0.79518072] mean value: 0.7982368549378833 key: test_precision value: [1. 0.33333333 0. 0. 1. 0.5 0. 1. 0. 1. ] mean value: 0.48333333333333334 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.18181818 0.09090909 0. 0. 0.09090909 0.09090909 0. 0.09090909 0. 0.18181818] mean value: 0.07272727272727272 key: train_recall value: [0.69 0.69 0.62626263 0.69 0.65 0.64 0.67 0.63 0.7 0.66 ] mean value: 0.6646262626262627 key: test_roc_auc value: [0.59090909 0.53096179 0.5 0.5 0.54545455 0.5381016 0.47794118 0.54545455 0.49264706 0.59090909] mean value: 0.5312378904130822 key: train_roc_auc value: [0.845 0.845 0.81313131 0.845 0.825 0.82 0.835 0.815 0.85 0.83 ] mean value: 0.8323131313131313 key: test_jcc value: [0.18181818 0.07692308 0. 0. 0.09090909 0.08333333 0. 0.09090909 0. 0.18181818] mean value: 0.07057109557109557 key: train_jcc value: [0.69 0.69 0.62626263 0.69 0.65 0.64 0.67 0.63 0.7 0.66 ] mean value: 0.6646262626262627 MCC on Blind test: -0.05 Accuracy on Blind test: 0.85 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [1.4557972 1.30422163 1.47492957 1.36501622 1.33721948 1.4778502 1.45930934 1.19137025 0.9904933 0.96908593] mean value: 1.3025293111801148 key: score_time value: [0.0134933 0.01342702 0.01551461 0.01319695 0.01367974 0.01323009 0.00934172 0.01059604 0.00958824 0.0093925 ] mean value: 0.01214601993560791 key: test_mcc value: [0.64666979 0.61736585 0.70588235 0.72659961 0.8307804 0.83459145 0.54627358 0.72659961 0.80762516 0.72659961] mean value: 0.7168987408471386 key: train_mcc value: [0.99417686 0.99417686 1. 0.98248849 0.98834113 0.98837134 0.99417818 0.98250043 0.98837134 0.99417818] mean value: 0.990678278264288 key: test_accuracy value: [0.925 0.9125 0.925 0.93670886 0.94936709 0.96202532 0.89873418 0.93670886 0.94936709 0.93670886] mean value: 0.9332120253164558 key: train_accuracy value: [0.99859748 0.99859748 1. 0.99579832 0.99719888 0.99719888 0.99859944 0.99579832 0.99719888 0.99859944] mean value: 0.9977587107774386 key: test_fscore value: [0.625 0.66666667 0.75 0.76190476 0.84615385 0.84210526 0.6 0.76190476 0.83333333 0.76190476] mean value: 0.7448973395026026 key: train_fscore value: [0.99497487 0.99497487 1. 0.98477157 0.98989899 0.99 0.99497487 0.98492462 0.99 0.99497487] mean value: 0.9919494684106066 key: test_precision value: [1. 0.7 0.75 0.8 0.73333333 1. 0.66666667 0.8 0.76923077 0.8 ] mean value: 0.801923076923077 key: train_precision value: [1. 1. 1. 1. 1. 0.99 1. 0.98989899 0.99 1. ] mean value: 0.996989898989899 key: test_recall value: [0.45454545 0.63636364 0.75 0.72727273 1. 0.72727273 0.54545455 0.72727273 0.90909091 0.72727273] mean value: 0.7204545454545455 key: train_recall value: [0.99 0.99 1. 0.97 0.98 0.99 0.99 0.98 0.99 0.99] mean value: 0.987 key: test_roc_auc value: [0.72727273 0.79644269 0.85294118 0.84893048 0.97058824 0.86363636 0.75066845 0.84893048 0.93248663 0.84893048] mean value: 0.8440827714485004 key: train_roc_auc value: [0.995 0.995 1. 0.985 0.99 0.99418567 0.995 0.98918567 0.99418567 0.995 ] mean value: 0.9932557003257328 key: test_jcc value: [0.45454545 0.5 0.6 0.61538462 0.73333333 0.72727273 0.42857143 0.61538462 0.71428571 0.61538462] mean value: 0.6004162504162505 key: train_jcc value: [0.99 0.99 1. 0.97 0.98 0.98019802 0.99 0.97029703 0.98019802 0.99 ] mean value: 0.984069306930693 MCC on Blind test: 0.69 Accuracy on Blind test: 0.92 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.09551215 0.06260157 0.07106519 0.04405999 0.12507939 0.04398108 0.07186937 0.04515743 0.07706308 0.09362435] mean value: 0.07300136089324952 key: score_time value: [0.02474332 0.01368976 0.02133179 0.01617622 0.02065301 0.02650046 0.01577353 0.01598001 0.02128029 0.02534318] mean value: 0.020147156715393067 key: test_mcc value: [ 0. 0.02411658 -0.06726728 -0.079909 0.11138831 0.16794369 0.04562045 0.04562045 -0.06482037 -0.079909 ] mean value: 0.010278382103322166 key: train_mcc value: [0.13131382 0.18596753 0.13208296 0.2465605 0.09279817 0.16095705 0.13132856 0.09279817 0.13132856 0.13132856] mean value: 0.14364638833100257 key: test_accuracy value: [0.8625 0.8125 0.825 0.82278481 0.84810127 0.86075949 0.82278481 0.82278481 0.83544304 0.82278481] mean value: 0.8335443037974684 key: train_accuracy value: [0.86255259 0.86535764 0.86395512 0.8697479 0.86134454 0.86414566 0.8627451 0.86134454 0.8627451 0.8627451 ] mean value: 0.8636683284814627 key: test_fscore value: [0. 0.11764706 0. 0. 0.14285714 0.15384615 0.125 0.125 0. 0. ] mean value: 0.06643503555268263 key: train_fscore value: [0.03921569 0.07692308 0.03960396 0.13084112 0.01980198 0.05825243 0.03921569 0.01980198 0.03921569 0.03921569] mean value: 0.05020872914929885 key: test_precision value: [0. 0.16666667 0. 0. 0.33333333 0.5 0.2 0.2 0. 0. ] mean value: 0.14 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0. 0.09090909 0. 0. 0.09090909 0.09090909 0.09090909 0.09090909 0. 0. ] mean value: 0.045454545454545456 key: train_recall value: [0.02 0.04 0.02020202 0.07 0.01 0.03 0.02 0.01 0.02 0.02 ] mean value: 0.02602020202020202 key: test_roc_auc value: [0.5 0.50922266 0.48529412 0.47794118 0.53074866 0.5381016 0.51604278 0.51604278 0.48529412 0.47794118] mean value: 0.5036629078508874 key: train_roc_auc value: [0.51 0.52 0.51010101 0.535 0.505 0.515 0.51 0.505 0.51 0.51 ] mean value: 0.513010101010101 key: test_jcc value: [0. 0.0625 0. 0. 0.07692308 0.08333333 0.06666667 0.06666667 0. 0. ] mean value: 0.03560897435897436 key: train_jcc value: [0.02 0.04 0.02020202 0.07 0.01 0.03 0.02 0.01 0.02 0.02 ] mean value: 0.02602020202020202 MCC on Blind test: -0.07 Accuracy on Blind test: 0.83 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.06339288 0.05783248 0.07663059 0.05685568 0.0568552 0.05862117 0.07298279 0.05087423 0.05057597 0.05194688] mean value: 0.05965678691864014 key: score_time value: [0.03598714 0.02616334 0.02632928 0.0277431 0.02712369 0.03276157 0.03423834 0.03742623 0.02653933 0.03339148] mean value: 0.030770349502563476 key: test_mcc value: [0.49671738 0.43221037 0.79349205 0.77524841 0.71339159 0.66135521 0.66135521 0.57419245 0.47099187 0.59219173] mean value: 0.6171146260529131 key: train_mcc value: [0.75737775 0.75592171 0.71913751 0.70836965 0.72964765 0.74296824 0.75133237 0.70919717 0.73600183 0.73045433] mean value: 0.7340408205100952 key: test_accuracy value: [0.9 0.875 0.95 0.94936709 0.93670886 0.92405063 0.92405063 0.91139241 0.88607595 0.91139241] mean value: 0.9168037974683544 key: train_accuracy value: [0.94530154 0.94530154 0.93828892 0.93557423 0.93977591 0.94257703 0.94397759 0.93557423 0.94117647 0.93977591] mean value: 0.9407323378159118 key: test_fscore value: [0.5 0.5 0.8 0.77777778 0.73684211 0.7 0.7 0.53333333 0.52631579 0.63157895] mean value: 0.6405847953216375 key: train_fscore value: [0.77966102 0.77192982 0.73809524 0.72941176 0.75144509 0.76300578 0.7752809 0.73255814 0.75581395 0.75428571] mean value: 0.7551487417549074 key: test_precision value: [0.8 0.55555556 1. 1. 0.875 0.77777778 0.77777778 1. 0.625 0.75 ] mean value: 0.8161111111111111 key: train_precision value: [0.8961039 0.92957746 0.89855072 0.88571429 0.89041096 0.90410959 0.88461538 0.875 0.90277778 0.88 ] mean value: 0.8946860081582964 key: test_recall value: [0.36363636 0.45454545 0.66666667 0.63636364 0.63636364 0.63636364 0.63636364 0.36363636 0.45454545 0.54545455] mean value: 0.5393939393939394 key: train_recall value: [0.69 0.66 0.62626263 0.62 0.65 0.66 0.69 0.63 0.65 0.66 ] mean value: 0.6536262626262627 key: test_roc_auc value: [0.67457181 0.69828722 0.83333333 0.81818182 0.81082888 0.80347594 0.80347594 0.68181818 0.7052139 0.75802139] mean value: 0.758720840114702 key: train_roc_auc value: [0.83847471 0.8259217 0.80743099 0.80348534 0.81848534 0.82429967 0.83767101 0.80767101 0.81929967 0.82267101] mean value: 0.820541046038065 key: test_jcc value: [0.33333333 0.33333333 0.66666667 0.63636364 0.58333333 0.53846154 0.53846154 0.36363636 0.35714286 0.46153846] mean value: 0.48122710622710624 key: train_jcc value: [0.63888889 0.62857143 0.58490566 0.57407407 0.60185185 0.61682243 0.63302752 0.57798165 0.60747664 0.60550459] mean value: 0.6069104730652053 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.55796146 0.56648779 0.42351294 0.53978658 0.51576281 0.65841341 0.52290344 0.52209496 0.5138824 0.54724836] mean value: 0.5368054151535034 key: score_time value: [0.04054976 0.03365135 0.02662635 0.02705789 0.02737045 0.02712655 0.02725124 0.02729344 0.02713633 0.02914262] mean value: 0.029320597648620605 key: test_mcc value: [0.49671738 0.43221037 0.79349205 0.77524841 0.71339159 0.66135521 0.66135521 0.57419245 0.47099187 0.59219173] mean value: 0.6171146260529131 key: train_mcc value: [0.75737775 0.75592171 0.71913751 0.70836965 0.72964765 0.74296824 0.75133237 0.70919717 0.73600183 0.73045433] mean value: 0.7340408205100952 key: test_accuracy value: [0.9 0.875 0.95 0.94936709 0.93670886 0.92405063 0.92405063 0.91139241 0.88607595 0.91139241] mean value: 0.9168037974683544 key: train_accuracy value: [0.94530154 0.94530154 0.93828892 0.93557423 0.93977591 0.94257703 0.94397759 0.93557423 0.94117647 0.93977591] mean value: 0.9407323378159118 key: test_fscore value: [0.5 0.5 0.8 0.77777778 0.73684211 0.7 0.7 0.53333333 0.52631579 0.63157895] mean value: 0.6405847953216375 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:115: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:118: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.77966102 0.77192982 0.73809524 0.72941176 0.75144509 0.76300578 0.7752809 0.73255814 0.75581395 0.75428571] mean value: 0.7551487417549074 key: test_precision value: [0.8 0.55555556 1. 1. 0.875 0.77777778 0.77777778 1. 0.625 0.75 ] mean value: 0.8161111111111111 key: train_precision value: [0.8961039 0.92957746 0.89855072 0.88571429 0.89041096 0.90410959 0.88461538 0.875 0.90277778 0.88 ] mean value: 0.8946860081582964 key: test_recall value: [0.36363636 0.45454545 0.66666667 0.63636364 0.63636364 0.63636364 0.63636364 0.36363636 0.45454545 0.54545455] mean value: 0.5393939393939394 key: train_recall value: [0.69 0.66 0.62626263 0.62 0.65 0.66 0.69 0.63 0.65 0.66 ] mean value: 0.6536262626262627 key: test_roc_auc value: [0.67457181 0.69828722 0.83333333 0.81818182 0.81082888 0.80347594 0.80347594 0.68181818 0.7052139 0.75802139] mean value: 0.758720840114702 key: train_roc_auc value: [0.83847471 0.8259217 0.80743099 0.80348534 0.81848534 0.82429967 0.83767101 0.80767101 0.81929967 0.82267101] mean value: 0.820541046038065 key: test_jcc value: [0.33333333 0.33333333 0.66666667 0.63636364 0.58333333 0.53846154 0.53846154 0.36363636 0.35714286 0.46153846] mean value: 0.48122710622710624 key: train_jcc value: [0.63888889 0.62857143 0.58490566 0.57407407 0.60185185 0.61682243 0.63302752 0.57798165 0.60747664 0.60550459] mean value: 0.6069104730652053 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.09754944 0.10211778 0.10299373 0.10591578 0.14832354 0.09998441 0.10208988 0.12186408 0.10876179 0.10412669] mean value: 0.10937271118164063 key: score_time value: [0.02463031 0.02883983 0.02641749 0.02212977 0.02265787 0.02216053 0.02307367 0.03439331 0.02317548 0.02400279] mean value: 0.02514810562133789 key: test_mcc value: [0.85400682 0.78527876 0.88320546 0.84660737 0.94158382 0.85331034 0.7972271 0.88273483 0.8028464 0.90184995] mean value: 0.8548650838325829 key: train_mcc value: [0.88228271 0.89152742 0.87873745 0.88658274 0.87741393 0.88379172 0.88861386 0.87883096 0.8901769 0.88365546] mean value: 0.8841613145210079 key: test_accuracy value: [0.9270073 0.89051095 0.94160584 0.91970803 0.97058824 0.92647059 0.89705882 0.94117647 0.89705882 0.94852941] mean value: 0.9259714469729498 key: train_accuracy value: [0.9405053 0.94539527 0.93887531 0.94295029 0.93811075 0.94136808 0.94381107 0.93892508 0.94462541 0.94136808] mean value: 0.9415934630424567 key: test_fscore value: [0.92647059 0.8951049 0.94202899 0.92517007 0.97101449 0.92753623 0.90140845 0.94029851 0.90410959 0.95104895] mean value: 0.9284190759769286 key: train_fscore value: [0.94210944 0.94652833 0.94023904 0.944 0.93968254 0.9427663 0.9451074 0.94033413 0.94585987 0.94267516] mean value: 0.9429302207466138 key: test_precision value: [0.92647059 0.85333333 0.94202899 0.87179487 0.95714286 0.91428571 0.86486486 0.95454545 0.84615385 0.90666667] mean value: 0.9037287182530149 key: train_precision value: [0.91808346 0.92801252 0.91900312 0.92621664 0.91640867 0.92080745 0.92379471 0.91912908 0.92523364 0.92211838] mean value: 0.9218807679243093 key: test_recall value: [0.92647059 0.94117647 0.94202899 0.98550725 0.98529412 0.94117647 0.94117647 0.92647059 0.97058824 1. ] mean value: 0.9559889173060528 key: train_recall value: [0.96742671 0.96579805 0.96247961 0.96247961 0.96416938 0.96579805 0.96742671 0.96254072 0.96742671 0.96416938] mean value: 0.9649714917291475 key: test_roc_auc value: [0.92700341 0.89087809 0.94160273 0.91922421 0.97058824 0.92647059 0.89705882 0.94117647 0.89705882 0.94852941] mean value: 0.9259590792838874 key: train_roc_auc value: [0.94048334 0.94537863 0.93889453 0.94296619 0.93811075 0.94136808 0.94381107 0.93892508 0.94462541 0.94136808] mean value: 0.9415931155049923 key: test_jcc value: [0.8630137 0.81012658 0.89041096 0.86075949 0.94366197 0.86486486 0.82051282 0.88732394 0.825 0.90666667] mean value: 0.8672341001020923 key: train_jcc value: [0.89055472 0.89848485 0.88721805 0.89393939 0.88622754 0.89172932 0.8959276 0.88738739 0.89728097 0.89156627] mean value: 0.892031609941911 MCC on Blind test: 0.53 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [2.06157088 2.21632552 2.12772703 2.1870575 2.24508309 2.5032568 2.55119944 2.44441295 2.43192077 1.92357469] mean value: 2.2692128658294677 key: score_time value: [0.02278161 0.034199 0.02382827 0.03283787 0.02307153 0.02474856 0.02306676 0.02312875 0.02295542 0.01745105] mean value: 0.024806880950927736 key: test_mcc value: [0.98550725 0.81460896 0.91281179 0.88654289 0.91334626 0.86774089 0.81600218 0.94158382 0.84271225 0.89949371] mean value: 0.8880350000215755 key: train_mcc value: [0.93972107 0.94462498 0.92685969 0.93171963 0.93337455 0.94314811 0.94961342 0.92520415 0.93337455 0.93171967] mean value: 0.9359359827608927 key: test_accuracy value: [0.99270073 0.90510949 0.95620438 0.94160584 0.95588235 0.93382353 0.90441176 0.97058824 0.91911765 0.94852941] mean value: 0.9427973379132675 key: train_accuracy value: [0.96984515 0.97229014 0.96332518 0.96577017 0.96661238 0.97149837 0.9747557 0.96254072 0.96661238 0.96579805] mean value: 0.9679048233423327 key: test_fscore value: [0.99270073 0.90909091 0.95588235 0.94444444 0.95714286 0.93430657 0.91034483 0.97014925 0.92307692 0.95035461] mean value: 0.9447493477213011 key: train_fscore value: [0.96999189 0.97244733 0.96368039 0.96607431 0.9669088 0.97175141 0.97493937 0.9628433 0.9669088 0.96607431] mean value: 0.9681619902040669 key: test_precision value: [0.98550725 0.86666667 0.97014925 0.90666667 0.93055556 0.92753623 0.85714286 0.98484848 0.88 0.91780822] mean value: 0.9226881182050526 key: train_precision value: [0.96607431 0.96774194 0.95367412 0.9568 0.9584 0.9632 0.96789727 0.95512821 0.9584 0.95833333] mean value: 0.9605649180027942 key: test_recall value: [1. 0.95588235 0.94202899 0.98550725 0.98529412 0.94117647 0.97058824 0.95588235 0.97058824 0.98529412] mean value: 0.9692242114237 key: train_recall value: [0.97394137 0.9771987 0.97389886 0.97553018 0.97557003 0.98045603 0.98208469 0.97068404 0.97557003 0.97394137] mean value: 0.9758875291592053 key: test_roc_auc value: [0.99275362 0.90547741 0.95630861 0.94128303 0.95588235 0.93382353 0.90441176 0.97058824 0.91911765 0.94852941] mean value: 0.9428175618073317 key: train_roc_auc value: [0.96984181 0.97228613 0.96333379 0.96577812 0.96661238 0.97149837 0.9747557 0.96254072 0.96661238 0.96579805] mean value: 0.9679057446955486 key: test_jcc value: [0.98550725 0.83333333 0.91549296 0.89473684 0.91780822 0.87671233 0.83544304 0.94202899 0.85714286 0.90540541] mean value: 0.8963611213537285 key: train_jcc value: [0.94173228 0.94637224 0.92990654 0.934375 0.9359375 0.94505495 0.9511041 0.92834891 0.9359375 0.934375 ] mean value: 0.9383144020926913 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02025676 0.01463985 0.01642394 0.01580334 0.01426625 0.01391959 0.0137136 0.01319313 0.0138135 0.01418281] mean value: 0.015021276473999024 key: score_time value: [0.01330924 0.01044321 0.01135588 0.01076484 0.0106945 0.01014638 0.00978804 0.00987625 0.00961947 0.01057482] mean value: 0.010657262802124024 key: test_mcc value: [0.64981886 0.63862773 0.78169078 0.75191816 0.67676337 0.7540057 0.64423542 0.82388584 0.63573029 0.67647059] mean value: 0.703314674416271 key: train_mcc value: [0.71960479 0.72838719 0.710723 0.73443309 0.7079266 0.71094685 0.73513731 0.70105507 0.72545745 0.70273867] mean value: 0.7176410015716607 key: test_accuracy value: [0.82481752 0.81751825 0.89051095 0.87591241 0.83823529 0.875 0.81617647 0.91176471 0.81617647 0.83823529] mean value: 0.8504347359381709 key: train_accuracy value: [0.8590057 0.86389568 0.85493073 0.86715566 0.8534202 0.85504886 0.86726384 0.85016287 0.86237785 0.8509772 ] mean value: 0.8584238589393373 key: test_fscore value: [0.82089552 0.82517483 0.89361702 0.87591241 0.8358209 0.88111888 0.83221477 0.91044776 0.82517483 0.83823529] mean value: 0.8538612199827047 key: train_fscore value: [0.86367218 0.86671987 0.85828025 0.86822959 0.85736926 0.85850556 0.86991221 0.85350318 0.86533865 0.85441527] mean value: 0.8615946032444376 key: test_precision value: [0.83333333 0.78666667 0.875 0.88235294 0.84848485 0.84 0.7654321 0.92424242 0.78666667 0.83823529] mean value: 0.8380414273453489 key: train_precision value: [0.83664122 0.84976526 0.83825816 0.86057692 0.83487654 0.83850932 0.85289515 0.83489097 0.84711388 0.83514774] mean value: 0.8428675171402082 key: test_recall value: [0.80882353 0.86764706 0.91304348 0.86956522 0.82352941 0.92647059 0.91176471 0.89705882 0.86764706 0.83823529] mean value: 0.872378516624041 key: train_recall value: [0.89250814 0.88436482 0.87928222 0.87601958 0.88110749 0.87947883 0.88762215 0.87296417 0.88436482 0.87459283] mean value: 0.8812305051782497 key: test_roc_auc value: [0.82470162 0.8178815 0.89034527 0.87595908 0.83823529 0.875 0.81617647 0.91176471 0.81617647 0.83823529] mean value: 0.8504475703324809 key: train_roc_auc value: [0.85897838 0.86387898 0.85495056 0.86716288 0.8534202 0.85504886 0.86726384 0.85016287 0.86237785 0.8509772 ] mean value: 0.8584221615273844 key: test_jcc value: [0.69620253 0.70238095 0.80769231 0.77922078 0.71794872 0.7875 0.71264368 0.83561644 0.70238095 0.72151899] mean value: 0.7463105345128135 key: train_jcc value: [0.76005548 0.76478873 0.75174338 0.76714286 0.75034674 0.75208914 0.76977401 0.74444444 0.76264045 0.74583333] mean value: 0.756885855885731 MCC on Blind test: 0.15 Accuracy on Blind test: 0.77 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01524782 0.01856661 0.01861858 0.0185709 0.01848412 0.01891041 0.02782226 0.01901007 0.0183835 0.01848745] mean value: 0.019210171699523926 key: score_time value: [0.0127418 0.01292229 0.01289725 0.01281047 0.01282048 0.01274705 0.01290607 0.01298213 0.01289558 0.01290154] mean value: 0.012862467765808105 key: test_mcc value: [0.70934757 0.64091263 0.68322489 0.7614264 0.73817324 0.72066617 0.55785938 0.82352941 0.64423542 0.76000982] mean value: 0.7039384936898134 key: train_mcc value: [0.7243465 0.71902729 0.71154553 0.70652812 0.69918792 0.71142953 0.71981239 0.70040764 0.71095972 0.69961749] mean value: 0.7102862122707756 key: test_accuracy value: [0.8540146 0.81751825 0.83941606 0.87591241 0.86764706 0.86029412 0.77205882 0.91176471 0.81617647 0.875 ] mean value: 0.8489802490339201 key: train_accuracy value: [0.8598207 0.85737571 0.85330073 0.85167074 0.84690554 0.85260586 0.85749186 0.84771987 0.8534202 0.84771987] mean value: 0.8528031081342965 key: test_fscore value: [0.85714286 0.82758621 0.84931507 0.88590604 0.87323944 0.86131387 0.79470199 0.91176471 0.83221477 0.88435374] mean value: 0.8577538677268463 key: train_fscore value: [0.86748844 0.86486486 0.86132512 0.85825545 0.85582822 0.86172651 0.86528099 0.85626441 0.86089645 0.85559846] mean value: 0.8607528903638494 key: test_precision value: [0.83333333 0.77922078 0.80519481 0.825 0.83783784 0.85507246 0.72289157 0.91176471 0.7654321 0.82278481] mean value: 0.8158532400394299 key: train_precision value: [0.82309942 0.82232012 0.81605839 0.82116244 0.80869565 0.81151079 0.82043796 0.81077147 0.81911765 0.81350954] mean value: 0.8166683432704045 key: test_recall value: [0.88235294 0.88235294 0.89855072 0.95652174 0.91176471 0.86764706 0.88235294 0.91176471 0.91176471 0.95588235] mean value: 0.9060954816709292 key: train_recall value: [0.91693811 0.91205212 0.91190865 0.89885808 0.90879479 0.91856678 0.91530945 0.90716612 0.90716612 0.90228013] mean value: 0.9099040336679225 key: test_roc_auc value: [0.85421995 0.81798806 0.83898124 0.87531969 0.86764706 0.86029412 0.77205882 0.91176471 0.81617647 0.875 ] mean value: 0.8489450127877238 key: train_roc_auc value: [0.85977411 0.85733112 0.85334846 0.85170917 0.84690554 0.85260586 0.85749186 0.84771987 0.8534202 0.84771987] mean value: 0.8528026048004421 key: test_jcc value: [0.75 0.70588235 0.73809524 0.79518072 0.775 0.75641026 0.65934066 0.83783784 0.71264368 0.79268293] mean value: 0.7523073672506922 key: train_jcc value: [0.76598639 0.76190476 0.7564276 0.75170532 0.74798928 0.75704698 0.76255088 0.74865591 0.75576662 0.74763833] mean value: 0.7555672081895808 MCC on Blind test: 0.13 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01776791 0.0143683 0.01367235 0.01267362 0.01240182 0.01215315 0.01259208 0.01372862 0.01309514 0.0132134 ] mean value: 0.01356663703918457 key: score_time value: [0.04105425 0.02325535 0.02105951 0.02271795 0.02249193 0.02242374 0.02348661 0.02194905 0.02194166 0.02291274] mean value: 0.024329280853271483 key: test_mcc value: [0.75261265 0.76196863 0.88320546 0.70218993 0.88852332 0.77311134 0.75203572 0.76603235 0.7004012 0.88273483] mean value: 0.7862815435156082 key: train_mcc value: [0.85158414 0.85591536 0.84591114 0.84657754 0.84562892 0.85043233 0.84449734 0.86388679 0.85299584 0.83909896] mean value: 0.8496528341555593 key: test_accuracy value: [0.87591241 0.87591241 0.94160584 0.84671533 0.94117647 0.88235294 0.875 0.88235294 0.84558824 0.94117647] mean value: 0.890779304422499 key: train_accuracy value: [0.92420538 0.92665037 0.92176039 0.92176039 0.9218241 0.9242671 0.92100977 0.93078176 0.92508143 0.91856678] mean value: 0.9235907472742766 key: test_fscore value: [0.87769784 0.88435374 0.94202899 0.8590604 0.94444444 0.89041096 0.87943262 0.88571429 0.85714286 0.94202899] mean value: 0.8962315127241446 key: train_fscore value: [0.92740047 0.92946708 0.9245283 0.92488263 0.92440945 0.92671395 0.92392157 0.93322859 0.92801252 0.92125984] mean value: 0.9263824405409482 key: test_precision value: [0.85915493 0.82278481 0.94202899 0.8 0.89473684 0.83333333 0.84931507 0.86111111 0.79746835 0.92857143] mean value: 0.858850486325596 key: train_precision value: [0.89055472 0.89577039 0.892261 0.8887218 0.89481707 0.89770992 0.89107413 0.90136571 0.89307229 0.89176829] mean value: 0.893711533581153 key: test_recall value: [0.89705882 0.95588235 0.94202899 0.92753623 1. 0.95588235 0.91176471 0.91176471 0.92647059 0.95588235] mean value: 0.9384271099744246 key: train_recall value: [0.96742671 0.96579805 0.95921697 0.96411093 0.95602606 0.95765472 0.95928339 0.96742671 0.96579805 0.95276873] mean value: 0.9615510306018885 key: test_roc_auc value: [0.87606564 0.8764919 0.94160273 0.84612106 0.94117647 0.88235294 0.875 0.88235294 0.84558824 0.94117647] mean value: 0.8907928388746803 key: train_roc_auc value: [0.92417013 0.92661844 0.92179089 0.92179488 0.9218241 0.9242671 0.92100977 0.93078176 0.92508143 0.91856678] mean value: 0.9235905277085513 key: test_jcc value: [0.78205128 0.79268293 0.89041096 0.75294118 0.89473684 0.80246914 0.78481013 0.79487179 0.75 0.89041096] mean value: 0.8135385202521164 key: train_jcc value: [0.86462882 0.8682284 0.85964912 0.86026201 0.85944363 0.86343612 0.85860058 0.87481591 0.86569343 0.8540146 ] mean value: 0.8628772629019651 MCC on Blind test: 0.12 Accuracy on Blind test: 0.8 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.06691337 0.06051254 0.07731438 0.07657027 0.07596445 0.07846117 0.07570052 0.07647061 0.07562709 0.07878208] mean value: 0.07423164844512939 key: score_time value: [0.02085137 0.02084017 0.02438045 0.02414894 0.02422881 0.02476311 0.02408767 0.02523947 0.02440548 0.02516818] mean value: 0.02381136417388916 key: test_mcc value: [0.78111679 0.78527876 0.86948194 0.86311873 0.88273483 0.82675403 0.78017138 0.85331034 0.75665657 0.88580789] mean value: 0.8284431268832849 key: train_mcc value: [0.85879004 0.86071236 0.85462057 0.85617638 0.85007907 0.86004287 0.86436688 0.86022618 0.87464084 0.85473165] mean value: 0.8594386838908423 key: test_accuracy value: [0.89051095 0.89051095 0.93430657 0.9270073 0.94117647 0.91176471 0.88970588 0.92647059 0.875 0.94117647] mean value: 0.9127629884070416 key: train_accuracy value: [0.92909535 0.92991035 0.92665037 0.92746536 0.9242671 0.92915309 0.93159609 0.92915309 0.93648208 0.9267101 ] mean value: 0.9290482997910743 key: test_fscore value: [0.89051095 0.8951049 0.93333333 0.93243243 0.94202899 0.91549296 0.89208633 0.92537313 0.88275862 0.94366197] mean value: 0.9152783610813747 key: train_fscore value: [0.93045564 0.93152866 0.92857143 0.92930898 0.92648221 0.93133386 0.93333333 0.93144208 0.93838863 0.92868463] mean value: 0.930952944168937 key: test_precision value: [0.88405797 0.85333333 0.95454545 0.87341772 0.92857143 0.87837838 0.87323944 0.93939394 0.83116883 0.90540541] mean value: 0.892151189994997 key: train_precision value: [0.91365777 0.91121495 0.90417311 0.90557276 0.90015361 0.90352221 0.91021672 0.90229008 0.91104294 0.90432099] mean value: 0.9066165128215168 key: test_recall value: [0.89705882 0.94117647 0.91304348 1. 0.95588235 0.95588235 0.91176471 0.91176471 0.94117647 0.98529412] mean value: 0.941304347826087 key: train_recall value: [0.94788274 0.95276873 0.954323 0.954323 0.95439739 0.96091205 0.95765472 0.96254072 0.96742671 0.95439739] mean value: 0.9566626459288701 key: test_roc_auc value: [0.8905584 0.89087809 0.93446292 0.92647059 0.94117647 0.91176471 0.88970588 0.92647059 0.875 0.94117647] mean value: 0.9127664109121909 key: train_roc_auc value: [0.92908003 0.92989171 0.9266729 0.92748723 0.9242671 0.92915309 0.93159609 0.92915309 0.93648208 0.9267101 ] mean value: 0.929049343486139 key: test_jcc value: [0.80263158 0.81012658 0.875 0.87341772 0.89041096 0.84415584 0.80519481 0.86111111 0.79012346 0.89333333] mean value: 0.8445505392234164 key: train_jcc value: [0.86995516 0.87183308 0.86666667 0.86795252 0.86303387 0.87149188 0.875 0.87168142 0.88392857 0.86686391] mean value: 0.8708407072769933 MCC on Blind test: 0.44 Accuracy on Blind test: 0.83 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [3.54420257 2.05947042 4.31787467 2.78397083 1.90792727 4.28846169 4.24948835 3.81972098 3.66617012 3.65981555] mean value: 3.4297102451324464 key: score_time value: [0.0131371 0.01322865 0.01350379 0.01714802 0.0131588 0.01303959 0.01309347 0.0130167 0.01305389 0.0130465 ] mean value: 0.013542652130126953 key: test_mcc value: [0.9001543 0.8110473 0.92710997 0.85977656 0.91334626 0.81101892 0.88273483 0.89715584 0.78981412 0.91334626] mean value: 0.8705504359250393 key: train_mcc value: [0.94182238 0.9315403 0.96577139 0.93394821 0.89971038 0.9691595 0.9257257 0.93051831 0.95464559 0.95114511] mean value: 0.9403986857141324 key: test_accuracy value: [0.94890511 0.90510949 0.96350365 0.9270073 0.95588235 0.90441176 0.94117647 0.94852941 0.88970588 0.95588235] mean value: 0.9340113782739373 key: train_accuracy value: [0.97066015 0.96577017 0.98288509 0.96658517 0.9495114 0.98452769 0.96172638 0.96416938 0.9771987 0.97557003] mean value: 0.9698604153559036 key: test_fscore value: [0.94656489 0.90647482 0.96350365 0.93150685 0.95714286 0.90780142 0.94029851 0.94890511 0.89795918 0.95714286] mean value: 0.935730013794081 key: train_fscore value: [0.97019868 0.96579805 0.98285714 0.96722622 0.95047923 0.98464026 0.96033755 0.96535433 0.97745572 0.97553018] mean value: 0.9699877354381214 key: test_precision value: [0.98412698 0.88732394 0.97058824 0.88311688 0.93055556 0.87671233 0.95454545 0.94202899 0.83544304 0.93055556] mean value: 0.9194996964105575 key: train_precision value: [0.98653199 0.96579805 0.98366013 0.94827586 0.93260188 0.97752809 0.99649737 0.93445122 0.96656051 0.97712418] mean value: 0.9669029280790539 key: test_recall value: [0.91176471 0.92647059 0.95652174 0.98550725 0.98529412 0.94117647 0.92647059 0.95588235 0.97058824 0.98529412] mean value: 0.9544970161977835 key: train_recall value: [0.95439739 0.96579805 0.98205546 0.98694943 0.96905537 0.99185668 0.9267101 0.99837134 0.98859935 0.97394137] mean value: 0.9737734535657921 key: test_roc_auc value: [0.94863598 0.90526428 0.96355499 0.92657715 0.95588235 0.90441176 0.94117647 0.94852941 0.88970588 0.95588235] mean value: 0.9339620630861041 key: train_roc_auc value: [0.97067341 0.96577015 0.98288441 0.96660175 0.9495114 0.98452769 0.96172638 0.96416938 0.9771987 0.97557003] mean value: 0.9698633303399206 key: test_jcc value: [0.89855072 0.82894737 0.92957746 0.87179487 0.91780822 0.83116883 0.88732394 0.90277778 0.81481481 0.91780822] mean value: 0.8800572235421898 key: train_jcc value: [0.94212219 0.93385827 0.96629213 0.93653251 0.90563166 0.96974522 0.9237013 0.93302892 0.95590551 0.9522293 ] mean value: 0.9419047007975033 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.15721059 0.12421274 0.11584687 0.1442461 0.16757679 0.12364697 0.1610961 0.1497941 0.11538434 0.16368508] mean value: 0.14226996898651123 key: score_time value: [0.0095911 0.01425385 0.01022911 0.00966549 0.01003146 0.00989795 0.01006746 0.00960088 0.00994182 0.01044965] mean value: 0.01037287712097168 key: test_mcc value: [0.8978896 0.86948194 0.85434012 0.8251228 0.83832595 0.82388584 0.75008111 0.91176471 0.86849267 0.94158382] mean value: 0.858096855393985 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94890511 0.93430657 0.9270073 0.91240876 0.91911765 0.91176471 0.875 0.95588235 0.93382353 0.97058824] mean value: 0.9288804207814513 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94814815 0.9352518 0.92857143 0.91428571 0.91970803 0.91044776 0.87407407 0.95588235 0.9352518 0.97101449] mean value: 0.9292635598287577 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95522388 0.91549296 0.91549296 0.90140845 0.91304348 0.92424242 0.88059701 0.95588235 0.91549296 0.95714286] mean value: 0.9234019332053377 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.94117647 0.95588235 0.94202899 0.92753623 0.92647059 0.89705882 0.86764706 0.95588235 0.95588235 0.98529412] mean value: 0.9354859335038364 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9488491 0.93446292 0.92689685 0.91229753 0.91911765 0.91176471 0.875 0.95588235 0.93382353 0.97058824] mean value: 0.9288682864450128 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90140845 0.87837838 0.86666667 0.84210526 0.85135135 0.83561644 0.77631579 0.91549296 0.87837838 0.94366197] mean value: 0.8689375646044208 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.69 Accuracy on Blind test: 0.92 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.20700741 0.2227931 0.21003532 0.20532703 0.20421553 0.22666597 0.20230484 0.22685504 0.20340753 0.21781135] mean value: 0.21264231204986572 key: score_time value: [0.02035546 0.02207088 0.02063417 0.02015138 0.02062988 0.02127051 0.02150893 0.02174425 0.02023482 0.02667117] mean value: 0.02152714729309082 key: test_mcc value: [0.92944673 0.86948194 0.92951942 0.92791659 0.95598573 0.91215932 0.86849267 0.91334626 0.83905224 0.94158382] mean value: 0.9086984732737446 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96350365 0.93430657 0.96350365 0.96350365 0.97794118 0.95588235 0.93382353 0.95588235 0.91911765 0.97058824] mean value: 0.9538052812365823 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96183206 0.9352518 0.96240602 0.96296296 0.97810219 0.95652174 0.9352518 0.95454545 0.92086331 0.97014925] mean value: 0.9537886582732334 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.91549296 1. 0.98484848 0.97101449 0.94285714 0.91549296 0.984375 0.90140845 0.98484848] mean value: 0.9600337971504919 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92647059 0.95588235 0.92753623 0.94202899 0.98529412 0.97058824 0.95588235 0.92647059 0.94117647 0.95588235] mean value: 0.9487212276214834 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96323529 0.93446292 0.96376812 0.96366155 0.97794118 0.95588235 0.93382353 0.95588235 0.91911765 0.97058824] mean value: 0.95383631713555 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92647059 0.87837838 0.92753623 0.92857143 0.95714286 0.91666667 0.87837838 0.91304348 0.85333333 0.94202899] mean value: 0.9121550326358511 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.86 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.02630854 0.02538991 0.02353644 0.02398157 0.02583075 0.02810669 0.03100824 0.02225041 0.02205133 0.02273846] mean value: 0.025120234489440917 key: score_time value: [0.01718497 0.01694894 0.01576543 0.01617956 0.0177052 0.01829982 0.01477838 0.01297951 0.01446557 0.01420522] mean value: 0.015851259231567383 key: test_mcc value: [0.78111679 0.66616982 0.72918846 0.81031543 0.82495791 0.73656956 0.57408838 0.82352941 0.76470588 0.808911 ] mean value: 0.7519552661564883 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.89051095 0.83211679 0.86131387 0.90510949 0.91176471 0.86764706 0.78676471 0.91176471 0.88235294 0.90441176] mean value: 0.8753756977243452 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.89051095 0.83687943 0.85271318 0.90510949 0.91428571 0.87142857 0.78195489 0.91176471 0.88235294 0.90510949] mean value: 0.875210935791714 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88405797 0.80821918 0.91666667 0.91176471 0.88888889 0.84722222 0.8 0.91176471 0.88235294 0.89855072] mean value: 0.874948800445332 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.89705882 0.86764706 0.79710145 0.89855072 0.94117647 0.89705882 0.76470588 0.91176471 0.88235294 0.91176471] mean value: 0.8769181585677749 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8905584 0.83237425 0.86178602 0.90515772 0.91176471 0.86764706 0.78676471 0.91176471 0.88235294 0.90441176] mean value: 0.8754582267689685 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.80263158 0.7195122 0.74324324 0.82666667 0.84210526 0.7721519 0.64197531 0.83783784 0.78947368 0.82666667] mean value: 0.7802264343228308 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.32 Accuracy on Blind test: 0.83 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [4.85859299 4.71020794 4.71203661 5.00940704 4.99922633 5.03658009 4.98546529 5.07385755 4.9456017 5.48568153] mean value: 4.981665706634521 key: score_time value: [0.12868524 0.10628939 0.11306834 0.11914802 0.11893177 0.11918879 0.11937928 0.11875987 0.11915421 0.10896921] mean value: 0.11715741157531738 key: test_mcc value: [0.92787101 0.8978896 0.94201665 0.97080136 0.94158382 0.92657079 0.92657079 0.97100831 0.88580789 0.98540068] mean value: 0.9375520891751091 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96350365 0.94890511 0.97080292 0.98540146 0.97058824 0.96323529 0.96323529 0.98529412 0.94117647 0.99264706] mean value: 0.9684789609274367 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96240602 0.94814815 0.97058824 0.98550725 0.97101449 0.96296296 0.96350365 0.98507463 0.94366197 0.99270073] mean value: 0.9685568078831959 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.98461538 0.95522388 0.98507463 0.98550725 0.95714286 0.97014925 0.95652174 1. 0.90540541 0.98550725] mean value: 0.9685147640241735 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.94117647 0.94117647 0.95652174 0.98550725 0.98529412 0.95588235 0.97058824 0.97058824 0.98529412 1. ] mean value: 0.9692028985507246 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96334186 0.9488491 0.97090793 0.98540068 0.97058824 0.96323529 0.96323529 0.98529412 0.94117647 0.99264706] mean value: 0.9684676044330777 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92753623 0.90140845 0.94285714 0.97142857 0.94366197 0.92857143 0.92957746 0.97058824 0.89333333 0.98550725] mean value: 0.9394470077069407 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.48 Accuracy on Blind test: 0.88 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.55692935 3.20952964 3.28124666 1.61803484 1.44277549 1.39598799 1.42690277 1.50912189 1.46272874 1.47186947] mean value: 1.8375126838684082 key: score_time value: [0.18893909 0.23083615 0.22105742 0.13914585 0.21864581 0.16200542 0.13056111 0.18395448 0.22659445 0.13184023] mean value: 0.18335800170898436 key: test_mcc value: [0.95629932 0.91240409 0.95630861 0.94160273 0.91176471 0.92657079 0.91215932 0.95598573 0.87000211 0.95598573] mean value: 0.9299083136889414 key: train_mcc value: [0.97392522 0.97555137 0.97556187 0.97392011 0.97070464 0.97232431 0.97557133 0.97070464 0.97882866 0.97068919] mean value: 0.973778132952908 key: test_accuracy value: [0.97810219 0.95620438 0.97810219 0.97080292 0.95588235 0.96323529 0.95588235 0.97794118 0.93382353 0.97794118] mean value: 0.9647917561185058 key: train_accuracy value: [0.98696007 0.98777506 0.98777506 0.98696007 0.98534202 0.98615635 0.98778502 0.98534202 0.98941368 0.98534202] mean value: 0.9868851360140594 key: test_fscore value: [0.97777778 0.95588235 0.97810219 0.97101449 0.95588235 0.96296296 0.95652174 0.97777778 0.93617021 0.97810219] mean value: 0.9650194048612931 key: train_fscore value: [0.98699187 0.98779496 0.98779496 0.98694943 0.98538961 0.98619009 0.98779496 0.98538961 0.98940505 0.98536585] mean value: 0.9869066381471465 key: test_precision value: [0.98507463 0.95588235 0.98529412 0.97101449 0.95588235 0.97014925 0.94285714 0.98507463 0.90410959 0.97101449] mean value: 0.9626353048397583 key: train_precision value: [0.98538961 0.98699187 0.98538961 0.98694943 0.98220065 0.98379254 0.98699187 0.98220065 0.99021207 0.98376623] mean value: 0.9853884534267398 key: test_recall value: [0.97058824 0.95588235 0.97101449 0.97101449 0.95588235 0.95588235 0.97058824 0.97058824 0.97058824 0.98529412] mean value: 0.9677323103154305 key: train_recall value: [0.98859935 0.98859935 0.99021207 0.98694943 0.98859935 0.98859935 0.98859935 0.98859935 0.98859935 0.98697068] mean value: 0.9884327624594162 key: test_roc_auc value: [0.97804774 0.95620205 0.97815431 0.97080136 0.95588235 0.96323529 0.95588235 0.97794118 0.93382353 0.97794118] mean value: 0.9647911338448424 key: train_roc_auc value: [0.98695873 0.98777439 0.98777705 0.98696006 0.98534202 0.98615635 0.98778502 0.98534202 0.98941368 0.98534202] mean value: 0.9868851326577786 key: test_jcc value: [0.95652174 0.91549296 0.95714286 0.94366197 0.91549296 0.92857143 0.91666667 0.95652174 0.88 0.95714286] mean value: 0.9327215175108623 key: train_jcc value: [0.97431782 0.97588424 0.97588424 0.9742351 0.9712 0.97275641 0.97588424 0.9712 0.97903226 0.97115385] mean value: 0.9741548169278077 MCC on Blind test: 0.45 Accuracy on Blind test: 0.86 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.04003906 0.04326582 0.03964067 0.04014897 0.04897428 0.07177114 0.09451461 0.06656861 0.07353854 0.06714416] mean value: 0.05856058597564697 key: score_time value: [0.02529025 0.02421188 0.02648306 0.0236609 0.02287126 0.0362916 0.03969383 0.04353642 0.03938127 0.03975558] mean value: 0.032117605209350586 key: test_mcc value: [0.70934757 0.64091263 0.68322489 0.7614264 0.73817324 0.72066617 0.55785938 0.82352941 0.64423542 0.76000982] mean value: 0.7039384936898134 key: train_mcc value: [0.7243465 0.71902729 0.71154553 0.70652812 0.69918792 0.71142953 0.71981239 0.70040764 0.71095972 0.69961749] mean value: 0.7102862122707756 key: test_accuracy value: [0.8540146 0.81751825 0.83941606 0.87591241 0.86764706 0.86029412 0.77205882 0.91176471 0.81617647 0.875 ] mean value: 0.8489802490339201 key: train_accuracy value: [0.8598207 0.85737571 0.85330073 0.85167074 0.84690554 0.85260586 0.85749186 0.84771987 0.8534202 0.84771987] mean value: 0.8528031081342965 key: test_fscore value: [0.85714286 0.82758621 0.84931507 0.88590604 0.87323944 0.86131387 0.79470199 0.91176471 0.83221477 0.88435374] mean value: 0.8577538677268463 key: train_fscore value: [0.86748844 0.86486486 0.86132512 0.85825545 0.85582822 0.86172651 0.86528099 0.85626441 0.86089645 0.85559846] mean value: 0.8607528903638494 key: test_precision value: [0.83333333 0.77922078 0.80519481 0.825 0.83783784 0.85507246 0.72289157 0.91176471 0.7654321 0.82278481] mean value: 0.8158532400394299 key: train_precision value: [0.82309942 0.82232012 0.81605839 0.82116244 0.80869565 0.81151079 0.82043796 0.81077147 0.81911765 0.81350954] mean value: 0.8166683432704045 key: test_recall value: [0.88235294 0.88235294 0.89855072 0.95652174 0.91176471 0.86764706 0.88235294 0.91176471 0.91176471 0.95588235] mean value: 0.9060954816709292 key: train_recall value: [0.91693811 0.91205212 0.91190865 0.89885808 0.90879479 0.91856678 0.91530945 0.90716612 0.90716612 0.90228013] mean value: 0.9099040336679225 key: test_roc_auc value: [0.85421995 0.81798806 0.83898124 0.87531969 0.86764706 0.86029412 0.77205882 0.91176471 0.81617647 0.875 ] mean value: 0.8489450127877238 key: train_roc_auc value: [0.85977411 0.85733112 0.85334846 0.85170917 0.84690554 0.85260586 0.85749186 0.84771987 0.8534202 0.84771987] mean value: 0.8528026048004421 key: test_jcc value: [0.75 0.70588235 0.73809524 0.79518072 0.775 0.75641026 0.65934066 0.83783784 0.71264368 0.79268293] mean value: 0.7523073672506922 key: train_jcc value: [0.76598639 0.76190476 0.7564276 0.75170532 0.74798928 0.75704698 0.76255088 0.74865591 0.75576662 0.74763833] mean value: 0.7555672081895808 MCC on Blind test: 0.13 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [9.29590559 7.42686653 8.49374866 7.30686116 8.50415301 6.25112653 3.41065454 2.65957332 6.32901025 8.18140841] mean value: 6.785930800437927 key: score_time value: [0.02076149 0.03504729 0.02796507 0.03136373 0.01840234 0.02318001 0.01379251 0.01399994 0.03346467 0.01805067] mean value: 0.023602771759033202 key: test_mcc value: [0.98550418 0.91281179 0.89863497 0.95629932 0.95681396 0.92657079 0.89715584 0.95598573 0.91533482 0.95681396] mean value: 0.9361925353808127 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.99270073 0.95620438 0.94890511 0.97810219 0.97794118 0.96323529 0.94852941 0.97794118 0.95588235 0.97794118] mean value: 0.9677382996994418 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.99259259 0.95652174 0.95035461 0.97841727 0.97841727 0.96350365 0.94890511 0.97810219 0.95774648 0.97841727] mean value: 0.9682978167991605 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.94285714 0.93055556 0.97142857 0.95774648 0.95652174 0.94202899 0.97101449 0.91891892 0.95774648] mean value: 0.9548818363897972 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.98529412 0.97058824 0.97101449 0.98550725 1. 0.97058824 0.95588235 0.98529412 1. 1. ] mean value: 0.9824168797953965 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.99264706 0.95630861 0.94874254 0.97804774 0.97794118 0.96323529 0.94852941 0.97794118 0.95588235 0.97794118] mean value: 0.9677216538789429 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.98529412 0.91666667 0.90540541 0.95774648 0.95774648 0.92957746 0.90277778 0.95714286 0.91891892 0.95774648] mean value: 0.9389022644967135 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.74 Accuracy on Blind test: 0.94 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.12504745 0.15010977 0.14459181 0.12664557 0.143291 0.1131978 0.13656402 0.12558675 0.13988376 0.12933922] mean value: 0.13342571258544922 key: score_time value: [0.02595425 0.02889538 0.01287436 0.02456045 0.02117848 0.02104068 0.02054429 0.03904009 0.02880216 0.04108047] mean value: 0.026397061347961426 key: test_mcc value: [0.94199209 0.85739162 0.89863497 0.88654289 0.84271225 0.86849267 0.78632938 0.89715584 0.88580789 0.8722811 ] mean value: 0.8737340702606493 key: train_mcc value: [0.91751286 0.92541695 0.91414846 0.92215919 0.92047016 0.92054835 0.92539568 0.91263814 0.92073412 0.91728977] mean value: 0.9196313681335729 key: test_accuracy value: [0.97080292 0.9270073 0.94890511 0.94160584 0.91911765 0.93382353 0.88970588 0.94852941 0.94117647 0.93382353] mean value: 0.9354497638471446 key: train_accuracy value: [0.95843521 0.96251019 0.95680522 0.9608802 0.96009772 0.96009772 0.96254072 0.95602606 0.96009772 0.95846906] mean value: 0.9595959797073979 key: test_fscore value: [0.97014925 0.92957746 0.95035461 0.94444444 0.92307692 0.9352518 0.89655172 0.94814815 0.94366197 0.93706294] mean value: 0.9378279275711674 key: train_fscore value: [0.95923261 0.96308186 0.957498 0.96141479 0.96057924 0.96064257 0.96302251 0.9568 0.96076861 0.95903614] mean value: 0.9602076343607397 key: test_precision value: [0.98484848 0.89189189 0.93055556 0.90666667 0.88 0.91549296 0.84415584 0.95522388 0.90540541 0.89333333] mean value: 0.9107574020200675 key: train_precision value: [0.94191523 0.94936709 0.94164038 0.94770206 0.9491256 0.94770206 0.95079365 0.94025157 0.94488189 0.94611727] mean value: 0.9459496798466626 key: test_recall value: [0.95588235 0.97058824 0.97101449 0.98550725 0.97058824 0.95588235 0.95588235 0.94117647 0.98529412 0.98529412] mean value: 0.9677109974424553 key: train_recall value: [0.9771987 0.9771987 0.97389886 0.97553018 0.9723127 0.97394137 0.97557003 0.97394137 0.9771987 0.9723127 ] mean value: 0.9749103304621368 key: test_roc_auc value: [0.9706948 0.9273231 0.94874254 0.94128303 0.91911765 0.93382353 0.88970588 0.94852941 0.94117647 0.93382353] mean value: 0.9354219948849105 key: train_roc_auc value: [0.9584199 0.96249821 0.95681914 0.96089213 0.96009772 0.96009772 0.96254072 0.95602606 0.96009772 0.95846906] mean value: 0.9595958361451928 key: test_jcc value: [0.94202899 0.86842105 0.90540541 0.89473684 0.85714286 0.87837838 0.8125 0.90140845 0.89333333 0.88157895] mean value: 0.8834934252576709 key: train_jcc value: [0.92165899 0.92879257 0.91846154 0.92569659 0.92414861 0.92426584 0.92868217 0.91717791 0.92449923 0.9212963 ] mean value: 0.9234679748417127 MCC on Blind test: 0.58 Accuracy on Blind test: 0.88 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0395906 0.04507875 0.05771208 0.04412246 0.0433681 0.04452419 0.05319262 0.03797746 0.05140471 0.03866053] mean value: 0.045563149452209475 key: score_time value: [0.03348851 0.02256203 0.02129936 0.0222764 0.0238862 0.03728747 0.0332911 0.0254643 0.02613497 0.02396679] mean value: 0.02696571350097656 key: test_mcc value: [0.70801364 0.59804827 0.73858362 0.73747083 0.76503685 0.7972271 0.69305253 0.75008111 0.69125122 0.85628096] mean value: 0.7335046136648194 key: train_mcc value: [0.74611094 0.7540916 0.73661201 0.73642248 0.73671937 0.73495768 0.73997919 0.73002908 0.73997919 0.72517997] mean value: 0.738008152481437 key: test_accuracy value: [0.8540146 0.79562044 0.86861314 0.86861314 0.88235294 0.89705882 0.84558824 0.875 0.84558824 0.92647059] mean value: 0.8658920137398025 key: train_accuracy value: [0.87286064 0.87693562 0.86797066 0.86797066 0.86807818 0.86726384 0.86970684 0.86482085 0.86970684 0.86237785] mean value: 0.868769196870628 key: test_fscore value: [0.85294118 0.80821918 0.86567164 0.87142857 0.88059701 0.90140845 0.85106383 0.87591241 0.84671533 0.92957746] mean value: 0.8683535065204239 key: train_fscore value: [0.875 0.87851971 0.87060703 0.87019231 0.87060703 0.8694956 0.87220447 0.86698718 0.87220447 0.86469175] mean value: 0.8710509550632397 key: test_precision value: [0.85294118 0.75641026 0.89230769 0.85915493 0.89393939 0.86486486 0.82191781 0.86956522 0.84057971 0.89189189] mean value: 0.8543572941217562 key: train_precision value: [0.86119874 0.86804452 0.85289515 0.85511811 0.85423197 0.85511811 0.85579937 0.8533123 0.85579937 0.8503937 ] mean value: 0.8561911347045577 key: test_recall value: [0.85294118 0.86764706 0.84057971 0.88405797 0.86764706 0.94117647 0.88235294 0.88235294 0.85294118 0.97058824] mean value: 0.884228473998295 key: train_recall value: [0.88925081 0.88925081 0.88907015 0.8858075 0.88762215 0.88436482 0.88925081 0.88110749 0.88925081 0.87947883] mean value: 0.8864454198128496 key: test_roc_auc value: [0.85400682 0.79614237 0.86881927 0.86849957 0.88235294 0.89705882 0.84558824 0.875 0.84558824 0.92647059] mean value: 0.8659526854219949 key: train_roc_auc value: [0.87284727 0.87692557 0.86798784 0.86798519 0.86807818 0.86726384 0.86970684 0.86482085 0.86970684 0.86237785] mean value: 0.8687700261967893 key: test_jcc value: [0.74358974 0.67816092 0.76315789 0.7721519 0.78666667 0.82051282 0.74074074 0.77922078 0.73417722 0.86842105] mean value: 0.7686799731563452 key: train_jcc value: [0.77777778 0.78335725 0.7708628 0.77021277 0.7708628 0.76912181 0.7733711 0.76520509 0.7733711 0.76163611] mean value: 0.771577861199781 MCC on Blind test: 0.33 Accuracy on Blind test: 0.75 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02968049 0.07493019 0.06887722 0.09048057 0.08637094 0.05483675 0.0843811 0.06575966 0.05945992 0.07291675] mean value: 0.06876935958862304 key: score_time value: [0.0133028 0.04360104 0.03298616 0.02097654 0.01297069 0.02053523 0.03294492 0.01312232 0.02019286 0.020895 ] mean value: 0.023152756690979003 key: test_mcc value: [0.69429215 0.75857279 0.78324384 0.87308606 0.81150267 0.86849267 0.72627304 0.92657079 0.60999428 0.79909587] mean value: 0.7851124168520672 key: train_mcc value: [0.66621184 0.8643186 0.82741367 0.90240059 0.77738194 0.91019134 0.81688487 0.8969845 0.76596494 0.7890932 ] mean value: 0.821684548538899 key: test_accuracy value: [0.82481752 0.86861314 0.88321168 0.93430657 0.89705882 0.93382353 0.85294118 0.96323529 0.79411765 0.88970588] mean value: 0.8841831258050665 key: train_accuracy value: [0.80929095 0.92909535 0.90953545 0.95028525 0.87785016 0.95439739 0.9014658 0.94788274 0.87214984 0.88517915] mean value: 0.903713209039818 key: test_fscore value: [0.85 0.88157895 0.87096774 0.93793103 0.90666667 0.9352518 0.86842105 0.96296296 0.76271186 0.90066225] mean value: 0.8877154320671432 key: train_fscore value: [0.83928571 0.93312836 0.90254609 0.95177866 0.89067055 0.95562599 0.90976883 0.94920635 0.85503232 0.89639971] mean value: 0.9083442572874197 key: test_precision value: [0.73913043 0.79761905 0.98181818 0.89473684 0.82926829 0.91549296 0.78571429 0.97014925 0.9 0.81927711] mean value: 0.8633206404633871 key: train_precision value: [0.72565321 0.88355167 0.97718631 0.92331288 0.8060686 0.93055556 0.83906465 0.92569659 0.98720682 0.81659973] mean value: 0.8814896031917655 key: test_recall value: [1. 0.98529412 0.7826087 0.98550725 1. 0.95588235 0.97058824 0.95588235 0.66176471 1. ] mean value: 0.9297527706734868 key: train_recall value: [0.99511401 0.98859935 0.83849918 0.98205546 0.99511401 0.98208469 0.99348534 0.97394137 0.75407166 0.99348534] mean value: 0.9496450414738218 key: test_roc_auc value: [0.82608696 0.86945865 0.88395141 0.93393009 0.89705882 0.93382353 0.85294118 0.96323529 0.79411765 0.88970588] mean value: 0.8844309462915602 key: train_roc_auc value: [0.80913938 0.92904682 0.90947761 0.95031112 0.87785016 0.95439739 0.9014658 0.94788274 0.87214984 0.88517915] mean value: 0.9036900011158876 key: test_jcc value: [0.73913043 0.78823529 0.77142857 0.88311688 0.82926829 0.87837838 0.76744186 0.92857143 0.61643836 0.81927711] mean value: 0.8021286608141679 key: train_jcc value: [0.72307692 0.87463977 0.8224 0.90799397 0.80289093 0.91502276 0.83447332 0.90332326 0.74677419 0.81225033] mean value: 0.8342845467581183 MCC on Blind test: 0.52 Accuracy on Blind test: 0.85 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.05821252 0.07661438 0.08219218 0.11488438 0.11355114 0.05209708 0.07517815 0.06350946 0.06845737 0.0433135 ] mean value: 0.07480101585388184 key: score_time value: [0.02136469 0.02088022 0.02089167 0.04012966 0.02557182 0.02096868 0.0201993 0.01298022 0.01303434 0.02602172] mean value: 0.02220423221588135 key: test_mcc value: [0.90025835 0.75258453 0.89869927 0.9001543 0.71492035 0.76894131 0.79549513 0.84567499 0.78144702 0.76249285] mean value: 0.812066810336383 key: train_mcc value: [0.8810362 0.8013501 0.9022761 0.9186774 0.71018517 0.70887969 0.8969845 0.77059101 0.89396869 0.72834633] mean value: 0.8212295193778039 key: test_accuracy value: [0.94890511 0.86131387 0.94890511 0.94890511 0.83823529 0.875 0.89705882 0.91911765 0.88235294 0.86764706] mean value: 0.8987440961786174 key: train_accuracy value: [0.93887531 0.89242054 0.95110024 0.9592502 0.83631922 0.83550489 0.94788274 0.8737785 0.94543974 0.8485342 ] mean value: 0.9029105575156163 key: test_fscore value: [0.95035461 0.87741935 0.94814815 0.95104895 0.86075949 0.88741722 0.89393939 0.92413793 0.89333333 0.88311688] mean value: 0.9069675317602913 key: train_fscore value: [0.94145199 0.90236686 0.95073892 0.95961228 0.85894737 0.85834502 0.94648829 0.88743646 0.94761532 0.86770982] mean value: 0.9120712328049019 key: test_precision value: [0.91780822 0.7816092 0.96969697 0.91891892 0.75555556 0.80722892 0.921875 0.87012987 0.81707317 0.79069767] mean value: 0.8550593489694658 key: train_precision value: [0.90404798 0.82655827 0.95702479 0.9504 0.75462392 0.75369458 0.97250859 0.80078637 0.9112782 0.77020202] mean value: 0.8601124713698691 key: test_recall value: [0.98529412 1. 0.92753623 0.98550725 1. 0.98529412 0.86764706 0.98529412 0.98529412 1. ] mean value: 0.9721867007672634 key: train_recall value: [0.98208469 0.99348534 0.94453507 0.96900489 0.99674267 0.99674267 0.9218241 0.99511401 0.98697068 0.99348534] mean value: 0.9779989478774224 key: test_roc_auc value: [0.9491688 0.86231884 0.94906223 0.94863598 0.83823529 0.875 0.89705882 0.91911765 0.88235294 0.86764706] mean value: 0.8988597612958227 key: train_roc_auc value: [0.93884006 0.8923381 0.9510949 0.95925815 0.83631922 0.83550489 0.94788274 0.8737785 0.94543974 0.8485342 ] mean value: 0.9028990493700548 key: test_jcc value: [0.90540541 0.7816092 0.90140845 0.90666667 0.75555556 0.79761905 0.80821918 0.85897436 0.80722892 0.79069767] mean value: 0.8313384448491006 key: train_jcc value: [0.88938053 0.82210243 0.90610329 0.92236025 0.75276753 0.75184275 0.8984127 0.79765013 0.90044577 0.76633166] mean value: 0.8407397023682444 MCC on Blind test: 0.41 Accuracy on Blind test: 0.77 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.67593026 0.74290133 0.6617012 0.73324609 0.68701911 0.72807765 0.84482217 0.75904655 0.80371714 0.76402116] mean value: 0.7400482654571533 key: score_time value: [0.0317471 0.0234251 0.02291584 0.02281976 0.02341962 0.02329421 0.03307962 0.0250566 0.02536154 0.02511978] mean value: 0.02562391757965088 key: test_mcc value: [0.92787101 0.89791134 0.89863497 0.84156943 0.92657079 0.88273483 0.82352941 0.92657079 0.89949371 0.91533482] mean value: 0.8940221094416068 key: train_mcc value: [0.95764031 0.94948177 0.95764076 0.95602223 0.96091715 0.95444297 0.96255221 0.95114007 0.96254199 0.9527801 ] mean value: 0.9565159566173079 key: test_accuracy value: [0.96350365 0.94890511 0.94890511 0.91970803 0.96323529 0.94117647 0.91176471 0.96323529 0.94852941 0.95588235] mean value: 0.9464845427221984 key: train_accuracy value: [0.97881011 0.97473513 0.97881011 0.97799511 0.98045603 0.9771987 0.98127036 0.97557003 0.98127036 0.97638436] mean value: 0.9782500285381309 key: test_fscore value: [0.96240602 0.94890511 0.95035461 0.92307692 0.96350365 0.94202899 0.91176471 0.96350365 0.95035461 0.95774648] mean value: 0.9473644736994636 key: train_fscore value: [0.9788961 0.97469388 0.97886179 0.97806661 0.98042414 0.97730956 0.981316 0.97557003 0.9812856 0.97644192] mean value: 0.9782865639540559 key: test_precision value: [0.98461538 0.94202899 0.93055556 0.89189189 0.95652174 0.92857143 0.91176471 0.95652174 0.91780822 0.91891892] mean value: 0.933919856838173 key: train_precision value: [0.97572816 0.97708674 0.97568882 0.97411003 0.98202614 0.97258065 0.97893031 0.97557003 0.9804878 0.97406807] mean value: 0.9766276753260145 key: test_recall value: [0.94117647 0.95588235 0.97101449 0.95652174 0.97058824 0.95588235 0.91176471 0.97058824 0.98529412 1. ] mean value: 0.9618712702472293 key: train_recall value: [0.98208469 0.9723127 0.98205546 0.98205546 0.97882736 0.98208469 0.98371336 0.97557003 0.98208469 0.97882736] mean value: 0.9799615815846666 key: test_roc_auc value: [0.96334186 0.94895567 0.94874254 0.91943734 0.96323529 0.94117647 0.91176471 0.96323529 0.94852941 0.95588235] mean value: 0.9464300937766411 key: train_roc_auc value: [0.97880743 0.9747371 0.97881275 0.97799842 0.98045603 0.9771987 0.98127036 0.97557003 0.98127036 0.97638436] mean value: 0.9782505539584783 key: test_jcc value: [0.92753623 0.90277778 0.90540541 0.85714286 0.92957746 0.89041096 0.83783784 0.92957746 0.90540541 0.91891892] mean value: 0.9004590322853835 key: train_jcc value: [0.95866455 0.95063694 0.95859873 0.95707472 0.9616 0.95562599 0.96331738 0.95230525 0.96325879 0.95396825] mean value: 0.9575050598665193 MCC on Blind test: 0.66 Accuracy on Blind test: 0.92 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.35942435 0.37208867 0.38434052 0.42646241 0.40576959 0.39391208 0.37551808 0.40830898 0.34282279 0.383919 ] mean value: 0.38525664806365967 key: score_time value: [0.02972317 0.03043103 0.03320718 0.03213596 0.03038836 0.02972698 0.03265667 0.03124976 0.03130412 0.03155637] mean value: 0.03123795986175537 key: test_mcc value: [0.8978896 0.89791134 0.92791659 0.88360693 0.89715584 0.89715584 0.83905224 0.8979331 0.88388348 0.91215932] mean value: 0.8934664277275621 key: train_mcc value: [0.99188303 0.99837134 0.9886543 0.9967453 0.99186852 0.98860066 0.98860066 0.99185799 0.99349061 0.99185799] mean value: 0.9921930405041824 key: test_accuracy value: [0.94890511 0.94890511 0.96350365 0.94160584 0.94852941 0.94852941 0.91911765 0.94852941 0.94117647 0.95588235] mean value: 0.9464684413911549 key: train_accuracy value: [0.99592502 0.999185 0.99429503 0.99837001 0.99592834 0.99429967 0.99429967 0.99592834 0.99674267 0.99592834] mean value: 0.9960902096955313 key: test_fscore value: [0.94814815 0.94890511 0.96296296 0.94117647 0.94890511 0.94890511 0.92086331 0.94964029 0.94285714 0.95652174] mean value: 0.946888538927638 key: train_fscore value: [0.99591169 0.999185 0.99425759 0.99836601 0.99591837 0.99429503 0.99429503 0.99593165 0.99673736 0.99593165] mean value: 0.9960829383048007 key: test_precision value: [0.95522388 0.94202899 0.98484848 0.95522388 0.94202899 0.94202899 0.90140845 0.92957746 0.91666667 0.94285714] mean value: 0.941189292758102 key: train_precision value: [1. 1. 1. 1. 0.99836334 0.99510604 0.99510604 0.99512195 0.99836601 0.99512195] mean value: 0.9977185326077931 key: test_recall value: [0.94117647 0.95588235 0.94202899 0.92753623 0.95588235 0.95588235 0.94117647 0.97058824 0.97058824 0.97058824] mean value: 0.9531329923273657 key: train_recall value: [0.99185668 0.99837134 0.98858075 0.99673736 0.99348534 0.99348534 0.99348534 0.99674267 0.99511401 0.99674267] mean value: 0.994460149528936 key: test_roc_auc value: [0.9488491 0.94895567 0.96366155 0.94170929 0.94852941 0.94852941 0.91911765 0.94852941 0.94117647 0.95588235] mean value: 0.946494032395567 key: train_roc_auc value: [0.99592834 0.99918567 0.99429038 0.99836868 0.99592834 0.99429967 0.99429967 0.99592834 0.99674267 0.99592834] mean value: 0.9960900096178882 key: test_jcc value: [0.90140845 0.90277778 0.92857143 0.88888889 0.90277778 0.90277778 0.85333333 0.90410959 0.89189189 0.91666667] mean value: 0.8993203582430864 key: train_jcc value: [0.99185668 0.99837134 0.98858075 0.99673736 0.99186992 0.98865478 0.98865478 0.99189627 0.99349593 0.99189627] mean value: 0.9922014081324269 MCC on Blind test: 0.69 Accuracy on Blind test: 0.92 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [1.45340419 1.60676646 1.59989524 1.58247232 1.4779551 1.49836159 1.49765706 1.47903895 1.51418066 1.45867777] mean value: 1.516840934753418 key: score_time value: [0.08413672 0.05650353 0.07431793 0.08095384 0.07365823 0.07279754 0.07393479 0.08042669 0.06745076 0.0800302 ] mean value: 0.07442102432250977 key: test_mcc value: [0.73747083 0.77188355 0.92709446 0.87086187 0.8979331 0.82675403 0.7540057 0.79411765 0.73656956 0.92737353] mean value: 0.8244064275741751 key: train_mcc value: [0.95605202 0.96748206 0.9593539 0.95780409 0.95608819 0.95631148 0.96103952 0.96435357 0.96260328 0.9528711 ] mean value: 0.9593959205340611 key: test_accuracy value: [0.86861314 0.88321168 0.96350365 0.93430657 0.94852941 0.91176471 0.875 0.89705882 0.86764706 0.96323529] mean value: 0.9112870330613998 key: train_accuracy value: [0.97799511 0.98370008 0.9796251 0.97881011 0.97801303 0.97801303 0.98045603 0.98208469 0.98127036 0.97638436] mean value: 0.9796351897719339 key: test_fscore value: [0.86567164 0.88888889 0.96402878 0.93706294 0.94964029 0.91549296 0.88111888 0.89705882 0.87142857 0.96402878] mean value: 0.9134420543292832 key: train_fscore value: [0.97813765 0.98381877 0.97975709 0.97899838 0.97813765 0.97827836 0.98061389 0.98225806 0.98137652 0.97655618] mean value: 0.9797932562619014 key: test_precision value: [0.87878788 0.84210526 0.95714286 0.90540541 0.92957746 0.87837838 0.84 0.89705882 0.84722222 0.94366197] mean value: 0.8919340265243767 key: train_precision value: [0.9726248 0.97749196 0.97266881 0.9696 0.9726248 0.96661367 0.97275641 0.97284345 0.97584541 0.96950241] mean value: 0.9722571720692034 key: test_recall value: [0.85294118 0.94117647 0.97101449 0.97101449 0.97058824 0.95588235 0.92647059 0.89705882 0.89705882 0.98529412] mean value: 0.936849957374254 key: train_recall value: [0.98371336 0.99022801 0.98694943 0.98858075 0.98371336 0.99022801 0.98859935 0.99185668 0.98697068 0.98371336] mean value: 0.9874552980748282 key: test_roc_auc value: [0.86849957 0.88363171 0.96344842 0.93403666 0.94852941 0.91176471 0.875 0.89705882 0.86764706 0.96323529] mean value: 0.9112851662404092 key: train_roc_auc value: [0.97799045 0.98369476 0.97963107 0.97881806 0.97801303 0.97801303 0.98045603 0.98208469 0.98127036 0.97638436] mean value: 0.9796355829981243 key: test_jcc value: [0.76315789 0.8 0.93055556 0.88157895 0.90410959 0.84415584 0.7875 0.81333333 0.7721519 0.93055556] mean value: 0.8427098618480825 key: train_jcc value: [0.95721078 0.96815287 0.96031746 0.95886076 0.95721078 0.95748031 0.96196513 0.96513471 0.96343402 0.95418641] mean value: 0.9603953231785132 MCC on Blind test: 0.23 Accuracy on Blind test: 0.82 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [3.24940395 3.52079797 3.22503304 3.50924063 3.05567026 2.2908988 2.29794836 2.28992009 2.27863598 2.27230477] mean value: 2.7989853858947753 key: score_time value: [0.01407909 0.01406145 0.01401496 0.01402545 0.00979352 0.00969934 0.00976634 0.0096848 0.00975347 0.009619 ] mean value: 0.011449742317199706 key: test_mcc value: [0.95629932 0.8687127 0.8978896 0.89863497 0.91176471 0.88273483 0.83832595 0.91215932 0.90184995 0.94280904] mean value: 0.901118038297138 key: train_mcc value: [0.99022004 0.99185136 0.98859135 0.99022004 0.99023327 0.99022801 0.99348534 0.99185799 0.99348534 0.99022801] mean value: 0.9910400765606606 key: test_accuracy value: [0.97810219 0.93430657 0.94890511 0.94890511 0.95588235 0.94117647 0.91911765 0.95588235 0.94852941 0.97058824] mean value: 0.9501395448690425 key: train_accuracy value: [0.99511002 0.99592502 0.99429503 0.99511002 0.99511401 0.99511401 0.99674267 0.99592834 0.99674267 0.99511401] mean value: 0.9955195798125244 key: test_fscore value: [0.97777778 0.93430657 0.94964029 0.95035461 0.95588235 0.94029851 0.91970803 0.95522388 0.95104895 0.97142857] mean value: 0.9505669537495187 key: train_fscore value: [0.99511401 0.99592502 0.99428571 0.99510604 0.99510604 0.99511401 0.99674267 0.99593165 0.99674267 0.99511401] mean value: 0.9955181819751661 key: test_precision value: [0.98507463 0.92753623 0.94285714 0.93055556 0.95588235 0.95454545 0.91304348 0.96969697 0.90666667 0.94444444] mean value: 0.9430302923718009 key: train_precision value: [0.99511401 0.99673736 0.99509804 0.99510604 0.99673203 0.99511401 0.99674267 0.99512195 0.99674267 0.99511401] mean value: 0.9957622771290957 key: test_recall value: [0.97058824 0.94117647 0.95652174 0.97101449 0.95588235 0.92647059 0.92647059 0.94117647 1. 1. ] mean value: 0.9589300937766411 key: train_recall value: [0.99511401 0.99511401 0.99347471 0.99510604 0.99348534 0.99511401 0.99674267 0.99674267 0.99674267 0.99511401] mean value: 0.9952750131515322 key: test_roc_auc value: [0.97804774 0.93435635 0.9488491 0.94874254 0.95588235 0.94117647 0.91911765 0.95588235 0.94852941 0.97058824] mean value: 0.9501172208013641 key: train_roc_auc value: [0.99511002 0.99592568 0.99429436 0.99511002 0.99511401 0.99511401 0.99674267 0.99592834 0.99674267 0.99511401] mean value: 0.9955195785133188 key: test_jcc value: [0.95652174 0.87671233 0.90410959 0.90540541 0.91549296 0.88732394 0.85135135 0.91428571 0.90666667 0.94444444] mean value: 0.9062314140500687 key: train_jcc value: [0.99027553 0.99188312 0.98863636 0.99025974 0.99025974 0.99027553 0.99350649 0.99189627 0.99350649 0.99027553] mean value: 0.9910774800564104 MCC on Blind test: 0.74 Accuracy on Blind test: 0.94 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.06899261 0.05181146 0.05468035 0.05141401 0.04903173 0.04878616 0.06030512 0.05749083 0.11495543 0.06219625] mean value: 0.06196639537811279 key: score_time value: [0.01346803 0.01373625 0.01389027 0.01407194 0.01384234 0.01389575 0.01589203 0.01564479 0.01802635 0.01396275] mean value: 0.014643049240112305 key: test_mcc value: [0.74077551 0.69429215 0.84660737 0.70450233 0.6799747 0.66012934 0.6 0.70321085 0.61134064 0.77459667] mean value: 0.7015429553097181 key: train_mcc value: [0.72381466 0.73489837 0.74503794 0.70437109 0.71316163 0.6643151 0.70797069 0.6911857 0.64653991 0.69633693] mean value: 0.7027632013908158 key: test_accuracy value: [0.8540146 0.82481752 0.91970803 0.83211679 0.81617647 0.80882353 0.76470588 0.83088235 0.77205882 0.875 ] mean value: 0.8298303993130098 key: train_accuracy value: [0.84433578 0.85167074 0.85737571 0.83211084 0.83713355 0.80618893 0.83387622 0.8232899 0.79478827 0.82654723] mean value: 0.8307317176769166 key: test_fscore value: [0.87179487 0.85 0.92517007 0.85714286 0.8447205 0.8375 0.80952381 0.85534591 0.81437126 0.88888889] mean value: 0.8554458161706764 key: train_fscore value: [0.86520819 0.87055477 0.87491065 0.85594406 0.85994398 0.83765348 0.8575419 0.84982699 0.82972973 0.85218598] mean value: 0.8553499715201868 key: test_precision value: [0.77272727 0.73913043 0.87179487 0.75 0.7311828 0.72826087 0.68 0.74725275 0.68686869 0.8 ] mean value: 0.750721767869033 key: train_precision value: [0.7633873 0.77272727 0.77862595 0.74908201 0.75429975 0.72065728 0.75061125 0.73886883 0.70900693 0.74244256] mean value: 0.7479709134762966 key: test_recall value: [1. 1. 0.98550725 1. 1. 0.98529412 1. 1. 1. 1. ] mean value: 0.997080136402387 key: train_recall value: [0.99837134 0.99674267 0.99836868 0.99836868 1. 1. 1. 1. 1. 1. ] mean value: 0.9991851363774038 key: test_roc_auc value: [0.85507246 0.82608696 0.91922421 0.83088235 0.81617647 0.80882353 0.76470588 0.83088235 0.77205882 0.875 ] mean value: 0.8298913043478261 key: train_roc_auc value: [0.84421014 0.85155241 0.85749053 0.83224623 0.83713355 0.80618893 0.83387622 0.8232899 0.79478827 0.82654723] mean value: 0.8307323410790102 key: test_jcc value: [0.77272727 0.73913043 0.86075949 0.75 0.7311828 0.72043011 0.68 0.74725275 0.68686869 0.8 ] mean value: 0.7488351538528009 key: train_jcc value: [0.76243781 0.77078086 0.77763659 0.74816626 0.75429975 0.72065728 0.75061125 0.73886883 0.70900693 0.74244256] mean value: 0.7474908124059836 MCC on Blind test: 0.02 Accuracy on Blind test: 0.65 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.04560471 0.05994201 0.07156658 0.0524919 0.05347729 0.05321288 0.05366182 0.04788947 0.05366254 0.05350876] mean value: 0.054501795768737794 key: score_time value: [0.0202415 0.02025533 0.01818871 0.02007699 0.02113891 0.02003241 0.02011871 0.02007985 0.01999617 0.02001643] mean value: 0.020014500617980956 key: test_mcc value: [0.88355744 0.79705571 0.8978896 0.87308606 0.91334626 0.85442069 0.82928843 0.88273483 0.82928843 0.90184995] mean value: 0.8662517400011381 key: train_mcc value: [0.90300839 0.91267625 0.89468261 0.90436304 0.90139896 0.90309017 0.90601824 0.90152353 0.90623002 0.9029702 ] mean value: 0.9035961418167106 key: test_accuracy value: [0.94160584 0.89781022 0.94890511 0.93430657 0.95588235 0.92647059 0.91176471 0.94117647 0.91176471 0.94852941] mean value: 0.9318215972520395 key: train_accuracy value: [0.95110024 0.95599022 0.94702526 0.95191524 0.95032573 0.95114007 0.95276873 0.95032573 0.95276873 0.95114007] mean value: 0.9514500025219743 key: test_fscore value: [0.94029851 0.9 0.94964029 0.93793103 0.95714286 0.92857143 0.91666667 0.94029851 0.91666667 0.95104895] mean value: 0.9338264907274486 key: train_fscore value: [0.95215311 0.95686901 0.94795837 0.95268645 0.95131684 0.95215311 0.95352564 0.95139442 0.95367412 0.95207668] mean value: 0.9523807745491089 key: test_precision value: [0.95454545 0.875 0.94285714 0.89473684 0.93055556 0.90277778 0.86842105 0.95454545 0.86842105 0.90666667] mean value: 0.9098526999316473 key: train_precision value: [0.9328125 0.93887147 0.93081761 0.93690852 0.93270736 0.9328125 0.9384858 0.93135725 0.93573668 0.93416928] mean value: 0.9344678970829278 key: test_recall value: [0.92647059 0.92647059 0.95652174 0.98550725 0.98529412 0.95588235 0.97058824 0.92647059 0.97058824 1. ] mean value: 0.9603793691389599 key: train_recall value: [0.9723127 0.97557003 0.96574225 0.96900489 0.97068404 0.9723127 0.96905537 0.9723127 0.9723127 0.97068404] mean value: 0.9709991444861868 key: test_roc_auc value: [0.94149616 0.8980179 0.9488491 0.93393009 0.95588235 0.92647059 0.91176471 0.94117647 0.91176471 0.94852941] mean value: 0.9317881500426257 key: train_roc_auc value: [0.95108294 0.95597425 0.94704051 0.95192916 0.95032573 0.95114007 0.95276873 0.95032573 0.95276873 0.95114007] mean value: 0.9514495911069073 key: test_jcc value: [0.88732394 0.81818182 0.90410959 0.88311688 0.91780822 0.86666667 0.84615385 0.88732394 0.84615385 0.90666667] mean value: 0.8763505422482849 key: train_jcc value: [0.9086758 0.91730475 0.90106545 0.90964778 0.90715373 0.9086758 0.91117917 0.90729483 0.91145038 0.90853659] mean value: 0.9090984275974558 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.68297338 0.58741832 0.45031023 0.64178634 0.64991045 0.64547777 0.64209366 0.61012626 0.55511141 0.53283501] mean value: 0.5998042821884155 key: score_time value: [0.02061009 0.02557373 0.02007389 0.02004433 0.02005386 0.01999784 0.02003598 0.01997614 0.02412701 0.01992822] mean value: 0.0210421085357666 key: test_mcc value: [0.8978896 0.87099729 0.91277477 0.9001543 0.8979331 0.85331034 0.78632938 0.91215932 0.88580789 0.89949371] mean value: 0.8816849688659582 key: train_mcc value: [0.91404667 0.92692597 0.91396326 0.91229254 0.91728977 0.91728977 0.92206383 0.91085966 0.91747489 0.90910351] mean value: 0.91613098611342 key: test_accuracy value: [0.94890511 0.93430657 0.95620438 0.94890511 0.94852941 0.92647059 0.88970588 0.95588235 0.94117647 0.94852941] mean value: 0.9398615285530271 key: train_accuracy value: [0.95680522 0.96332518 0.95680522 0.95599022 0.95846906 0.95846906 0.96091205 0.95521173 0.95846906 0.95439739] mean value: 0.9578854174133038 key: test_fscore value: [0.94814815 0.93617021 0.95714286 0.95104895 0.94964029 0.92753623 0.89655172 0.95522388 0.94366197 0.95035461] mean value: 0.9415478875254766 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:136: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:139: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.957498 0.96379726 0.95736122 0.95652174 0.95903614 0.95903614 0.96135266 0.95589415 0.95916733 0.95498392] mean value: 0.9584648570657469 key: test_precision value: [0.95522388 0.90410959 0.94366197 0.91891892 0.92957746 0.91428571 0.84415584 0.96969697 0.90540541 0.91780822] mean value: 0.9202843977898764 key: train_precision value: [0.94312796 0.95230525 0.94444444 0.94435612 0.94611727 0.94611727 0.95063694 0.94154818 0.94330709 0.94285714] mean value: 0.945481767751615 key: test_recall value: [0.94117647 0.97058824 0.97101449 0.98550725 0.97058824 0.94117647 0.95588235 0.94117647 0.98529412 0.98529412] mean value: 0.964769820971867 key: train_recall value: [0.9723127 0.97557003 0.97063622 0.96900489 0.9723127 0.9723127 0.9723127 0.97068404 0.97557003 0.96742671] mean value: 0.9718142737963027 key: test_roc_auc value: [0.9488491 0.93456948 0.95609548 0.94863598 0.94852941 0.92647059 0.88970588 0.95588235 0.94117647 0.94852941] mean value: 0.9398444160272805 key: train_roc_auc value: [0.95679257 0.9633152 0.95681648 0.95600082 0.95846906 0.95846906 0.96091205 0.95521173 0.95846906 0.95439739] mean value: 0.9578853398940438 key: test_jcc value: [0.90140845 0.88 0.91780822 0.90666667 0.90410959 0.86486486 0.8125 0.91428571 0.89333333 0.90540541] mean value: 0.8900382243479388 key: train_jcc value: [0.91846154 0.93012422 0.91820988 0.91666667 0.9212963 0.9212963 0.9255814 0.91551459 0.92153846 0.91384615] mean value: 0.9202535501533893 MCC on Blind test: 0.65 Accuracy on Blind test: 0.91 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04414916 0.08380175 0.07032514 0.07745457 0.05293298 0.05717969 0.05249119 0.05452228 0.08637619 0.11670065] mean value: 0.06959335803985596 key: score_time value: [0.01268363 0.01511669 0.01902628 0.01525545 0.01350904 0.01557541 0.01358795 0.0154953 0.01878643 0.01540351] mean value: 0.015443968772888183 key: test_mcc value: [0.812277 0.75191816 0.88320546 0.83947987 0.82495791 0.82675403 0.73817324 0.84051051 0.72443685 0.78017138] mean value: 0.8021884424651033 key: train_mcc value: [0.83048073 0.85004477 0.83701763 0.81907163 0.83553259 0.82747132 0.84527799 0.82736156 0.85343831 0.84366611] mean value: 0.8369362649023049 key: test_accuracy value: [0.90510949 0.87591241 0.94160584 0.91970803 0.91176471 0.91176471 0.86764706 0.91911765 0.86029412 0.88970588] mean value: 0.9002629884070417 key: train_accuracy value: [0.91524042 0.92502037 0.91850041 0.90953545 0.91775244 0.91368078 0.92263844 0.91368078 0.9267101 0.9218241 ] mean value: 0.9184583303467847 key: test_fscore value: [0.90076336 0.87591241 0.94202899 0.92086331 0.90909091 0.91549296 0.87323944 0.91603053 0.86713287 0.89208633] mean value: 0.9012641098273885 key: train_fscore value: [0.91530945 0.92520325 0.91816694 0.90938776 0.91808597 0.91297209 0.92270138 0.91368078 0.92694805 0.92156863] mean value: 0.9184024291795301 key: test_precision value: [0.93650794 0.86956522 0.94202899 0.91428571 0.9375 0.87837838 0.83783784 0.95238095 0.82666667 0.87323944] mean value: 0.8968391125575755 key: train_precision value: [0.91530945 0.9237013 0.92118227 0.91013072 0.91437803 0.9205298 0.92195122 0.91368078 0.92394822 0.92459016] mean value: 0.9189401945593438 key: test_recall value: [0.86764706 0.88235294 0.94202899 0.92753623 0.88235294 0.95588235 0.91176471 0.88235294 0.91176471 0.91176471] mean value: 0.9075447570332481 key: train_recall value: [0.91530945 0.9267101 0.91517129 0.908646 0.9218241 0.90553746 0.92345277 0.91368078 0.92996743 0.91856678] mean value: 0.9178866151941378 key: test_roc_auc value: [0.90483802 0.87595908 0.94160273 0.91965047 0.91176471 0.91176471 0.86764706 0.91911765 0.86029412 0.88970588] mean value: 0.900234441602728 key: train_roc_auc value: [0.91524037 0.925019 0.9184977 0.90953473 0.91775244 0.91368078 0.92263844 0.91368078 0.9267101 0.9218241 ] mean value: 0.9184578433612659 key: test_jcc value: [0.81944444 0.77922078 0.89041096 0.85333333 0.83333333 0.84415584 0.775 0.84507042 0.7654321 0.80519481] mean value: 0.8210596019887293 key: train_jcc value: [0.84384384 0.86081694 0.84871407 0.83383234 0.84857571 0.83987915 0.85649547 0.84107946 0.86384266 0.85454545] mean value: 0.8491625104737037 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.32345986 1.43449521 1.27052402 1.54930925 1.47475839 1.37269783 1.32320762 1.29081202 1.4030478 1.44133997] mean value: 1.3883651971817017 key: score_time value: [0.01582932 0.02128839 0.01553774 0.01557136 0.01531792 0.01843786 0.02614975 0.01983738 0.01549149 0.01612997] mean value: 0.017959117889404297 key: test_mcc value: [0.94160273 0.87631485 0.92787101 0.8502811 0.86849267 0.92898531 0.82928843 0.92737353 0.82675403 0.85628096] mean value: 0.8833244627512812 key: train_mcc value: [0.93540003 0.96791376 0.94666567 0.94883081 0.93851413 0.95851448 0.95806651 0.95199397 0.90417865 0.92389522] mean value: 0.9433973227538303 key: test_accuracy value: [0.97080292 0.93430657 0.96350365 0.91970803 0.93382353 0.96323529 0.91176471 0.96323529 0.91176471 0.92647059] mean value: 0.9398615285530271 key: train_accuracy value: [0.96740016 0.98370008 0.97310513 0.97392013 0.96905537 0.97882736 0.97882736 0.97557003 0.9519544 0.96172638] mean value: 0.971408642142457 key: test_fscore value: [0.97058824 0.93793103 0.96453901 0.9261745 0.9352518 0.96453901 0.91666667 0.96402878 0.91549296 0.92957746] mean value: 0.9424789445347015 key: train_fscore value: [0.968 0.98397436 0.97349398 0.97448166 0.96950241 0.97926635 0.97913323 0.97607656 0.95253419 0.96230954] mean value: 0.9718772264685587 key: test_precision value: [0.97058824 0.88311688 0.94444444 0.8625 0.91549296 0.93150685 0.86842105 0.94366197 0.87837838 0.89189189] mean value: 0.9090002664649828 key: train_precision value: [0.95125786 0.96845426 0.95886076 0.95319813 0.9556962 0.959375 0.96518987 0.95625 0.94117647 0.9478673 ] mean value: 0.9557325852844888 key: test_recall value: [0.97058824 1. 0.98550725 1. 0.95588235 1. 0.97058824 0.98529412 0.95588235 0.97058824] mean value: 0.9794330775788577 key: train_recall value: [0.98534202 1. 0.98858075 0.99673736 0.98371336 1. 0.99348534 0.99674267 0.96416938 0.9771987 ] mean value: 0.9885969573465256 key: test_roc_auc value: [0.97080136 0.93478261 0.96334186 0.91911765 0.93382353 0.96323529 0.91176471 0.96323529 0.91176471 0.92647059] mean value: 0.9398337595907928 key: train_roc_auc value: [0.96738553 0.98368679 0.97311774 0.97393871 0.96905537 0.97882736 0.97882736 0.97557003 0.9519544 0.96172638] mean value: 0.9714089674851614 key: test_jcc value: [0.94285714 0.88311688 0.93150685 0.8625 0.87837838 0.93150685 0.84615385 0.93055556 0.84415584 0.86842105] mean value: 0.8919152401479367 key: train_jcc value: [0.9379845 0.96845426 0.94835681 0.95023328 0.94080997 0.959375 0.9591195 0.95327103 0.9093702 0.92735703] mean value: 0.9454331569694207 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0196569 0.01368809 0.01368642 0.01357722 0.01296353 0.01416087 0.01404524 0.01523566 0.01527238 0.01575828] mean value: 0.014804458618164063 key: score_time value: [0.01393223 0.0100534 0.00978374 0.00978899 0.00955415 0.0099647 0.01073837 0.0102942 0.01072192 0.01070642] mean value: 0.010553812980651856 key: test_mcc value: [0.64295346 0.51887407 0.71597934 0.63063055 0.66356093 0.60352881 0.69305253 0.68120121 0.54559454 0.69731096] mean value: 0.6392686404912149 key: train_mcc value: [0.6456698 0.66922404 0.66332001 0.64476831 0.62687987 0.63075937 0.66776333 0.62667844 0.67204092 0.63837522] mean value: 0.6485479305996751 key: test_accuracy value: [0.81751825 0.75912409 0.8540146 0.81021898 0.83088235 0.80147059 0.84558824 0.83823529 0.77205882 0.84558824] mean value: 0.8174699441820524 key: train_accuracy value: [0.82233089 0.83374083 0.83129584 0.8190709 0.81188925 0.81433225 0.83306189 0.81188925 0.83550489 0.81840391] mean value: 0.8231519901032417 key: test_fscore value: [0.8 0.76258993 0.84375 0.79365079 0.82442748 0.8057554 0.83969466 0.828125 0.76335878 0.83464567] mean value: 0.8095997702713673 key: train_fscore value: [0.81742044 0.8277027 0.82706767 0.80492091 0.80205656 0.80645161 0.82700422 0.80239521 0.83082077 0.81181435] mean value: 0.8157654434944623 key: test_precision value: [0.87719298 0.74647887 0.91525424 0.87719298 0.85714286 0.78873239 0.87301587 0.88333333 0.79365079 0.89830508] mean value: 0.851029941169467 key: train_precision value: [0.84137931 0.85964912 0.84760274 0.87238095 0.84629295 0.84219858 0.85814361 0.84504505 0.85517241 0.84238179] mean value: 0.8510246507261562 key: test_recall value: [0.73529412 0.77941176 0.7826087 0.72463768 0.79411765 0.82352941 0.80882353 0.77941176 0.73529412 0.77941176] mean value: 0.7742540494458653 key: train_recall value: [0.79478827 0.7980456 0.80750408 0.74714519 0.76221498 0.77361564 0.7980456 0.76384365 0.80781759 0.78338762] mean value: 0.7836408223560106 key: test_roc_auc value: [0.81692242 0.7592711 0.85453964 0.81084825 0.83088235 0.80147059 0.84558824 0.83823529 0.77205882 0.84558824] mean value: 0.8175404944586531 key: train_roc_auc value: [0.82235335 0.83376995 0.83127647 0.81901233 0.81188925 0.81433225 0.83306189 0.81188925 0.83550489 0.81840391] mean value: 0.8231493535822648 key: test_jcc value: [0.66666667 0.61627907 0.72972973 0.65789474 0.7012987 0.6746988 0.72368421 0.70666667 0.61728395 0.71621622] mean value: 0.681041874351185 key: train_jcc value: [0.69121813 0.70605187 0.70512821 0.67352941 0.6695279 0.67567568 0.70503597 0.67 0.71060172 0.68323864] mean value: 0.6890007519859123 MCC on Blind test: 0.37 Accuracy on Blind test: 0.78 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02289462 0.01865649 0.01860642 0.01872373 0.01842427 0.01855087 0.01844025 0.01858234 0.01843119 0.0383122 ] mean value: 0.020962238311767578 key: score_time value: [0.0129528 0.01290703 0.01294899 0.01289344 0.01297498 0.01293111 0.01290679 0.01291537 0.01290774 0.01293302] mean value: 0.01292712688446045 key: test_mcc value: [0.59324085 0.56235346 0.62041773 0.50373224 0.63242133 0.67911938 0.4738791 0.70710678 0.5008673 0.66240967] mean value: 0.5935547845683921 key: train_mcc value: [0.64145228 0.63325194 0.6040146 0.5949754 0.59609121 0.60099713 0.62396473 0.60912052 0.63687624 0.60749911] mean value: 0.6148243154248934 key: test_accuracy value: [0.79562044 0.7810219 0.81021898 0.75182482 0.81617647 0.83823529 0.73529412 0.85294118 0.75 0.83088235] mean value: 0.7962215543151567 key: train_accuracy value: [0.8207009 0.81662592 0.80195599 0.79706601 0.7980456 0.8004886 0.81188925 0.80456026 0.81840391 0.80374593] mean value: 0.8073482368744508 key: test_fscore value: [0.78461538 0.7826087 0.8115942 0.75714286 0.81751825 0.84507042 0.75 0.84848485 0.75714286 0.83453237] mean value: 0.7988709890747785 key: train_fscore value: [0.82200647 0.81692433 0.80355699 0.79128248 0.7980456 0.79967294 0.81415929 0.80456026 0.81972514 0.80422421] mean value: 0.8074157715143396 key: test_precision value: [0.82258065 0.77142857 0.8115942 0.74647887 0.8115942 0.81081081 0.71052632 0.875 0.73611111 0.81690141] mean value: 0.79130261417885 key: train_precision value: [0.81672026 0.81626016 0.79647436 0.8137931 0.7980456 0.80295567 0.80445151 0.80456026 0.81380417 0.80226904] mean value: 0.8069334137924529 key: test_recall value: [0.75 0.79411765 0.8115942 0.76811594 0.82352941 0.88235294 0.79411765 0.82352941 0.77941176 0.85294118] mean value: 0.8079710144927537 key: train_recall value: [0.82736156 0.81758958 0.81076672 0.76998369 0.7980456 0.79641694 0.82410423 0.80456026 0.8257329 0.80618893] mean value: 0.8080750407830343 key: test_roc_auc value: [0.79528986 0.78111679 0.81020887 0.75170503 0.81617647 0.83823529 0.73529412 0.85294118 0.75 0.83088235] mean value: 0.7961849957374254 key: train_roc_auc value: [0.82069546 0.81662513 0.80196317 0.79704396 0.7980456 0.8004886 0.81188925 0.80456026 0.81840391 0.80374593] mean value: 0.8073461270730269 key: test_jcc value: [0.64556962 0.64285714 0.68292683 0.6091954 0.69135802 0.73170732 0.6 0.73684211 0.6091954 0.71604938] mean value: 0.6665701226720038 key: train_jcc value: [0.6978022 0.69050894 0.67162162 0.65464632 0.66395664 0.66621253 0.68656716 0.67302452 0.69452055 0.67255435] mean value: 0.6771414841563377 MCC on Blind test: 0.34 Accuracy on Blind test: 0.71 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01670122 0.01258683 0.01203728 0.01207185 0.0117681 0.01326847 0.01234245 0.01252675 0.01242781 0.01216888] mean value: 0.01278996467590332 key: score_time value: [0.04255915 0.02082086 0.02272153 0.02175403 0.02360296 0.02218795 0.02258205 0.02123189 0.02138758 0.02124214] mean value: 0.024009013175964357 key: test_mcc value: [0.81460896 0.73858362 0.80402464 0.67267776 0.72698376 0.77459667 0.76409318 0.79967098 0.75653442 0.8722811 ] mean value: 0.7724055093528688 key: train_mcc value: [0.85191645 0.85040708 0.85300846 0.85379503 0.85346882 0.87206933 0.85100719 0.85930172 0.86550007 0.8553372 ] mean value: 0.8565811351622082 key: test_accuracy value: [0.90510949 0.86861314 0.89781022 0.82481752 0.86029412 0.875 0.875 0.89705882 0.86764706 0.93382353] mean value: 0.8805173894375269 key: train_accuracy value: [0.92176039 0.92176039 0.92257539 0.92257539 0.92345277 0.93241042 0.92100977 0.92589577 0.92915309 0.92345277] mean value: 0.9244046149476093 key: test_fscore value: [0.90909091 0.87142857 0.90540541 0.84615385 0.86896552 0.88888889 0.88590604 0.90277778 0.88157895 0.93706294] mean value: 0.8897258840686593 key: train_fscore value: [0.92694064 0.92649311 0.92742552 0.92764661 0.92791411 0.93649579 0.92634776 0.93048128 0.93343535 0.92846271] mean value: 0.9291642877686668 key: test_precision value: [0.86666667 0.84722222 0.84810127 0.75862069 0.81818182 0.8 0.81481481 0.85526316 0.79761905 0.89333333] mean value: 0.8299823016210597 key: train_precision value: [0.87 0.87427746 0.87212644 0.87 0.87681159 0.88311688 0.86770982 0.87625899 0.88023088 0.87142857] mean value: 0.8741960630292233 key: test_recall value: [0.95588235 0.89705882 0.97101449 0.95652174 0.92647059 1. 0.97058824 0.95588235 0.98529412 0.98529412] mean value: 0.9604006820119353 key: train_recall value: [0.99185668 0.98534202 0.99021207 0.99347471 0.98534202 0.99674267 0.99348534 0.99185668 0.99348534 0.99348534] mean value: 0.9915282877502112 key: test_roc_auc value: [0.90547741 0.86881927 0.89727195 0.8238491 0.86029412 0.875 0.875 0.89705882 0.86764706 0.93382353] mean value: 0.880424126172208 key: train_roc_auc value: [0.92170322 0.92170853 0.92263047 0.92263312 0.92345277 0.93241042 0.92100977 0.92589577 0.92915309 0.92345277] mean value: 0.9244049927998682 key: test_jcc value: [0.83333333 0.7721519 0.82716049 0.73333333 0.76829268 0.8 0.79518072 0.82278481 0.78823529 0.88157895] mean value: 0.802205151665905 key: train_jcc value: [0.86382979 0.86305278 0.86467236 0.86505682 0.86552217 0.88057554 0.86280057 0.87 0.87517934 0.86647727] mean value: 0.8677166644458821 MCC on Blind test: 0.36 Accuracy on Blind test: 0.85 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.07805991 0.07112241 0.07135248 0.0707829 0.07207608 0.07116532 0.07046342 0.07161427 0.06986785 0.07115483] mean value: 0.07176594734191895 key: score_time value: [0.02446485 0.02405882 0.02418971 0.02379894 0.02402472 0.02377605 0.02382326 0.02369189 0.02512693 0.02385998] mean value: 0.02408151626586914 key: test_mcc value: [0.67983923 0.75261265 0.87099729 0.82788248 0.79411765 0.82495791 0.72254413 0.79446135 0.69305253 0.83905224] mean value: 0.7799517454933128 key: train_mcc value: [0.81255164 0.83715827 0.83586511 0.84556016 0.82757672 0.84536769 0.83387622 0.83559466 0.84697743 0.82578657] mean value: 0.8346314474250649 key: test_accuracy value: [0.83941606 0.87591241 0.93430657 0.91240876 0.89705882 0.91176471 0.86029412 0.89705882 0.84558824 0.91911765] mean value: 0.8892926148561614 key: train_accuracy value: [0.90627547 0.91850041 0.91768541 0.92257539 0.91368078 0.92263844 0.91693811 0.91775244 0.92345277 0.91286645] mean value: 0.9172365665044638 key: test_fscore value: [0.83333333 0.87769784 0.93233083 0.91666667 0.89705882 0.91428571 0.86524823 0.89552239 0.85106383 0.92086331] mean value: 0.8904070960759222 key: train_fscore value: [0.90642799 0.91935484 0.91900561 0.92369478 0.91465378 0.92320129 0.91693811 0.91835085 0.92394822 0.91336032] mean value: 0.9178935802733702 key: test_precision value: [0.859375 0.85915493 0.96875 0.88 0.89705882 0.88888889 0.83561644 0.90909091 0.82191781 0.90140845] mean value: 0.8821261248366242 key: train_precision value: [0.90569106 0.91054313 0.90378549 0.90981013 0.9044586 0.91653291 0.91693811 0.9117175 0.91800643 0.90821256] mean value: 0.9105695905456304 key: test_recall value: [0.80882353 0.89705882 0.89855072 0.95652174 0.89705882 0.94117647 0.89705882 0.88235294 0.88235294 0.94117647] mean value: 0.9002131287297528 key: train_recall value: [0.90716612 0.92833876 0.93474715 0.93800979 0.92508143 0.92996743 0.91693811 0.92508143 0.92996743 0.91856678] mean value: 0.9253864424972501 key: test_roc_auc value: [0.83919437 0.87606564 0.93456948 0.9120844 0.89705882 0.91176471 0.86029412 0.89705882 0.84558824 0.91911765] mean value: 0.8892796248934356 key: train_roc_auc value: [0.90627474 0.91849238 0.91769931 0.92258796 0.91368078 0.92263844 0.91693811 0.91775244 0.92345277 0.91286645] mean value: 0.9172383376463273 key: test_jcc value: [0.71428571 0.78205128 0.87323944 0.84615385 0.81333333 0.84210526 0.7625 0.81081081 0.74074074 0.85333333] mean value: 0.8038553760486674 key: train_jcc value: [0.82886905 0.85074627 0.85014837 0.85820896 0.84272997 0.85735736 0.84661654 0.8490284 0.85864662 0.84053651] mean value: 0.8482888038296238 MCC on Blind test: 0.52 Accuracy on Blind test: 0.85 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [5.95455098 2.9566772 5.66405582 4.63643026 3.77995276 8.58176708 6.5036099 2.79477501 8.09094572 7.14753866] mean value: 5.611030340194702 key: score_time value: [0.01393247 0.02110529 0.01480699 0.01343775 0.03119874 0.01591206 0.01321292 0.02379489 0.02104306 0.01340723] mean value: 0.018185138702392578 key: test_mcc value: [0.98550725 0.84393916 0.97120941 0.9158731 0.94280904 0.91533482 0.88580789 0.90184995 0.95681396 0.97100831] mean value: 0.9290152902280305 key: train_mcc value: [0.9967453 0.96446746 0.99674532 0.99188303 0.97427222 0.99837266 0.98047683 0.89328059 0.99674796 0.99188957] mean value: 0.9784880937538125 key: test_accuracy value: [0.99270073 0.91970803 0.98540146 0.95620438 0.97058824 0.95588235 0.94117647 0.94852941 0.97794118 0.98529412] mean value: 0.9633426363246028 key: train_accuracy value: [0.99837001 0.98207009 0.99837001 0.99592502 0.98697068 0.99918567 0.99022801 0.94381107 0.99837134 0.99592834] mean value: 0.9889230240330883 key: test_fscore value: [0.99270073 0.92307692 0.98571429 0.95833333 0.97142857 0.95774648 0.94366197 0.95104895 0.97841727 0.98550725] mean value: 0.9647635757797159 key: train_fscore value: [0.99837398 0.98231511 0.99837134 0.99593826 0.98713826 0.99918633 0.99025974 0.94680031 0.99837398 0.99594485] mean value: 0.9892702169739379 key: test_precision value: [0.98550725 0.88 0.97183099 0.92 0.94444444 0.91891892 0.90540541 0.90666667 0.95774648 0.97142857] mean value: 0.9361948718029551 key: train_precision value: [0.99675325 0.96984127 0.99674797 0.99190939 0.97460317 0.99837398 0.98705502 0.89897511 0.99675325 0.99192246] mean value: 0.9802934855848118 key: test_recall value: [1. 0.97058824 1. 1. 1. 1. 0.98529412 1. 1. 1. ] mean value: 0.9955882352941177 key: train_recall value: [1. 0.99511401 1. 1. 1. 1. 0.99348534 1. 1. 1. ] mean value: 0.9988599348534202 key: test_roc_auc value: [0.99275362 0.92007673 0.98529412 0.95588235 0.97058824 0.95588235 0.94117647 0.94852941 0.97794118 0.98529412] mean value: 0.9633418584825234 key: train_roc_auc value: [0.99836868 0.98205945 0.99837134 0.99592834 0.98697068 0.99918567 0.99022801 0.94381107 0.99837134 0.99592834] mean value: 0.988922291714269 key: test_jcc value: [0.98550725 0.85714286 0.97183099 0.92 0.94444444 0.91891892 0.89333333 0.90666667 0.95774648 0.97142857] mean value: 0.9327019503100336 key: train_jcc value: [0.99675325 0.96524487 0.99674797 0.99190939 0.97460317 0.99837398 0.9807074 0.89897511 0.99675325 0.99192246] mean value: 0.9791990831042809 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.07329679 0.0687356 0.0710535 0.07005835 0.07708025 0.08290029 0.06846547 0.07136536 0.06657863 0.074476 ] mean value: 0.07240102291107178 key: score_time value: [0.01113582 0.01123357 0.01132107 0.01113486 0.01153493 0.0116272 0.01152778 0.01141524 0.01147556 0.01147532] mean value: 0.011388134956359864 key: test_mcc value: [0.91597649 0.92951942 1. 0.94318882 0.94280904 0.95681396 0.88852332 0.95681396 0.88852332 0.95681396] mean value: 0.9378982292461375 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95620438 0.96350365 1. 0.97080292 0.97058824 0.97794118 0.94117647 0.97794118 0.94117647 0.97794118] mean value: 0.9677275654787463 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95774648 0.96453901 1. 0.97183099 0.97142857 0.97841727 0.94444444 0.97841727 0.94444444 0.97841727] mean value: 0.9689685730759542 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91891892 0.93150685 1. 0.94520548 0.94444444 0.95774648 0.89473684 0.95774648 0.89473684 0.95774648] mean value: 0.9402788812960731 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95652174 0.96376812 1. 0.97058824 0.97058824 0.97794118 0.94117647 0.97794118 0.94117647 0.97794118] mean value: 0.9677642796248934 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.91891892 0.93150685 1. 0.94520548 0.94444444 0.95774648 0.89473684 0.95774648 0.89473684 0.95774648] mean value: 0.9402788812960731 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.61 Accuracy on Blind test: 0.91 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.23720622 0.2307353 0.23288369 0.236022 0.22834826 0.2347486 0.2318604 0.23015618 0.22972083 0.23117995] mean value: 0.23228614330291747 key: score_time value: [0.02257872 0.02273583 0.02290273 0.02357268 0.02258182 0.02260256 0.02258444 0.02248144 0.02255177 0.02253366] mean value: 0.022712564468383788 key: test_mcc value: [1. 0.94323594 1. 0.98550418 0.98540068 1. 0.98540068 1. 0.98540068 1. ] mean value: 0.9884942149819166 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97080292 1. 0.99270073 0.99264706 1. 0.99264706 1. 0.99264706 1. ] mean value: 0.9941444826105625 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.97142857 1. 0.99280576 0.99270073 1. 0.99270073 1. 0.99270073 1. ] mean value: 0.9942336516605277 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.94444444 1. 0.98571429 0.98550725 1. 0.98550725 1. 0.98550725 1. ] mean value: 0.9886680469289165 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.97101449 1. 0.99264706 0.99264706 1. 0.99264706 1. 0.99264706 1. ] mean value: 0.994160272804774 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.94444444 1. 0.98571429 0.98550725 1. 0.98550725 1. 0.98550725 1. ] mean value: 0.9886680469289165 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.39 Accuracy on Blind test: 0.88 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01662683 0.01681709 0.01678419 0.0168705 0.01649547 0.01645446 0.0163095 0.0163238 0.01661205 0.01642942] mean value: 0.016572332382202147 key: score_time value: [0.01098466 0.01094747 0.01095343 0.01092076 0.01088595 0.01097775 0.01093483 0.01091647 0.01096272 0.01096725] mean value: 0.01094512939453125 key: test_mcc value: [0.90259957 0.92951942 0.90246052 0.9158731 0.8753478 0.88852332 0.8623165 0.90184995 0.91533482 0.94280904] mean value: 0.9036634035137634 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94890511 0.96350365 0.94890511 0.95620438 0.93382353 0.94117647 0.92647059 0.94852941 0.95588235 0.97058824] mean value: 0.9493988836410476 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95104895 0.96453901 0.95172414 0.95833333 0.93793103 0.94444444 0.93150685 0.95104895 0.95774648 0.97142857] mean value: 0.951975175899855 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90666667 0.93150685 0.90789474 0.92 0.88311688 0.89473684 0.87179487 0.90666667 0.91891892 0.94444444] mean value: 0.9085746879870888 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94927536 0.96376812 0.94852941 0.95588235 0.93382353 0.94117647 0.92647059 0.94852941 0.95588235 0.97058824] mean value: 0.9493925831202046 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90666667 0.93150685 0.90789474 0.92 0.88311688 0.89473684 0.87179487 0.90666667 0.91891892 0.94444444] mean value: 0.9085746879870888 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.75 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [3.64647102 3.58429003 3.61780548 3.60837865 3.61581659 3.61313367 3.7440722 4.07526445 3.05344391 3.19134378] mean value: 3.5750019788742065 key: score_time value: [0.12135196 0.12102342 0.12090707 0.12046289 0.1204288 0.11982846 0.24435544 0.11087871 0.10419559 0.11555219] mean value: 0.12989845275878906 key: test_mcc value: [1. 0.95713391 1. 0.98550418 0.95681396 1. 0.95681396 0.97100831 0.91533482 0.98540068] mean value: 0.972800982776614 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97810219 1. 0.99270073 0.97794118 1. 0.97794118 0.98529412 0.95588235 0.99264706] mean value: 0.986050880206097 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.97841727 1. 0.99280576 0.97841727 1. 0.97841727 0.98550725 0.95774648 0.99270073] mean value: 0.9864012009133892 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95774648 1. 0.98571429 0.95774648 1. 0.95774648 0.97142857 0.91891892 0.98550725] mean value: 0.9734808459058306 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.97826087 1. 0.99264706 0.97794118 1. 0.97794118 0.98529412 0.95588235 0.99264706] mean value: 0.9860613810741689 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.95774648 1. 0.98571429 0.95774648 1. 0.95774648 0.97142857 0.91891892 0.98550725] mean value: 0.9734808459058306 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.48 Accuracy on Blind test: 0.88 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [2.45984983 2.94581032 2.88509512 2.17885375 1.31969047 1.28217196 1.37900114 1.31501865 1.3011024 1.29474759] mean value: 1.836134123802185 key: score_time value: [0.16247296 0.25455213 0.21584201 0.22554564 0.26957107 0.22703886 0.18415976 0.24923444 0.21996379 0.1540041 ] mean value: 0.21623847484588624 key: test_mcc value: [0.98550418 0.91281179 0.98550725 0.95710706 0.91176471 0.98540068 0.95681396 0.95598573 0.90184995 0.95681396] mean value: 0.950955925944105 key: train_mcc value: [0.98536269 0.98533135 0.98536289 0.98372107 0.98046123 0.98373423 0.99188957 0.98376033 0.98697592 0.98373423] mean value: 0.9850333498563252 key: test_accuracy value: [0.99270073 0.95620438 0.99270073 0.97810219 0.95588235 0.99264706 0.97794118 0.97794118 0.94852941 0.97794118] mean value: 0.9750590382138257 key: train_accuracy value: [0.99266504 0.99266504 0.99266504 0.99185004 0.99022801 0.99185668 0.99592834 0.99185668 0.99348534 0.99185668] mean value: 0.9925056877158611 key: test_fscore value: [0.99259259 0.95652174 0.99270073 0.9787234 0.95588235 0.99270073 0.97841727 0.97810219 0.95104895 0.97841727] mean value: 0.9755107221977611 key: train_fscore value: [0.99270073 0.99267697 0.99268887 0.99186992 0.9902439 0.99188312 0.99594485 0.99189627 0.99349593 0.99188312] mean value: 0.9925283686021121 key: test_precision value: [1. 0.94285714 1. 0.95833333 0.95588235 0.98550725 0.95774648 0.97101449 0.90666667 0.95774648] mean value: 0.9635754192675233 key: train_precision value: [0.98869144 0.99186992 0.98867314 0.98865478 0.98863636 0.98867314 0.99192246 0.98709677 0.99188312 0.98867314] mean value: 0.9894774265463709 key: test_recall value: [0.98529412 0.97058824 0.98550725 1. 0.95588235 1. 1. 0.98529412 1. 1. ] mean value: 0.9882566069906223 key: train_recall value: [0.99674267 0.99348534 0.99673736 0.99510604 0.99185668 0.99511401 1. 0.99674267 0.99511401 0.99511401] mean value: 0.9956012774255942 key: test_roc_auc value: [0.99264706 0.95630861 0.99275362 0.97794118 0.95588235 0.99264706 0.97794118 0.97794118 0.94852941 0.97794118] mean value: 0.9750532821824383 key: train_roc_auc value: [0.99266171 0.99266437 0.99266835 0.99185269 0.99022801 0.99185668 0.99592834 0.99185668 0.99348534 0.99185668] mean value: 0.9925058849785591 key: test_jcc value: [0.98529412 0.91666667 0.98550725 0.95833333 0.91549296 0.98550725 0.95774648 0.95714286 0.90666667 0.95774648] mean value: 0.9526104049703163 key: train_jcc value: [0.98550725 0.98546042 0.98548387 0.98387097 0.98067633 0.98389694 0.99192246 0.98392283 0.98707593 0.98389694] mean value: 0.9851713928531682 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02761936 0.01877713 0.01874304 0.01867342 0.01853132 0.0185585 0.03597736 0.01850724 0.01871991 0.01877713] mean value: 0.021288442611694335 key: score_time value: [0.01272559 0.01290345 0.01310086 0.01296473 0.01297116 0.01303959 0.02226686 0.01296878 0.0130465 0.0130434 ] mean value: 0.013903093338012696 key: test_mcc value: [0.59324085 0.56235346 0.62041773 0.50373224 0.63242133 0.67911938 0.4738791 0.70710678 0.5008673 0.66240967] mean value: 0.5935547845683921 key: train_mcc value: [0.64145228 0.63325194 0.6040146 0.5949754 0.59609121 0.60099713 0.62396473 0.60912052 0.63687624 0.60749911] mean value: 0.6148243154248934 key: test_accuracy value: [0.79562044 0.7810219 0.81021898 0.75182482 0.81617647 0.83823529 0.73529412 0.85294118 0.75 0.83088235] mean value: 0.7962215543151567 key: train_accuracy value: [0.8207009 0.81662592 0.80195599 0.79706601 0.7980456 0.8004886 0.81188925 0.80456026 0.81840391 0.80374593] mean value: 0.8073482368744508 key: test_fscore value: [0.78461538 0.7826087 0.8115942 0.75714286 0.81751825 0.84507042 0.75 0.84848485 0.75714286 0.83453237] mean value: 0.7988709890747785 key: train_fscore value: [0.82200647 0.81692433 0.80355699 0.79128248 0.7980456 0.79967294 0.81415929 0.80456026 0.81972514 0.80422421] mean value: 0.8074157715143396 key: test_precision value: [0.82258065 0.77142857 0.8115942 0.74647887 0.8115942 0.81081081 0.71052632 0.875 0.73611111 0.81690141] mean value: 0.79130261417885 key: train_precision value: [0.81672026 0.81626016 0.79647436 0.8137931 0.7980456 0.80295567 0.80445151 0.80456026 0.81380417 0.80226904] mean value: 0.8069334137924529 key: test_recall value: [0.75 0.79411765 0.8115942 0.76811594 0.82352941 0.88235294 0.79411765 0.82352941 0.77941176 0.85294118] mean value: 0.8079710144927537 key: train_recall value: [0.82736156 0.81758958 0.81076672 0.76998369 0.7980456 0.79641694 0.82410423 0.80456026 0.8257329 0.80618893] mean value: 0.8080750407830343 key: test_roc_auc value: [0.79528986 0.78111679 0.81020887 0.75170503 0.81617647 0.83823529 0.73529412 0.85294118 0.75 0.83088235] mean value: 0.7961849957374254 key: train_roc_auc value: [0.82069546 0.81662513 0.80196317 0.79704396 0.7980456 0.8004886 0.81188925 0.80456026 0.81840391 0.80374593] mean value: 0.8073461270730269 key: test_jcc value: [0.64556962 0.64285714 0.68292683 0.6091954 0.69135802 0.73170732 0.6 0.73684211 0.6091954 0.71604938] mean value: 0.6665701226720038 key: train_jcc value: [0.6978022 0.69050894 0.67162162 0.65464632 0.66395664 0.66621253 0.68656716 0.67302452 0.69452055 0.67255435] mean value: 0.6771414841563377 MCC on Blind test: 0.34 Accuracy on Blind test: 0.71 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [2.5532043 2.48869276 2.57855439 5.6198473 9.45530009 7.02551937 7.66338468 6.32294011 7.51284075 7.98216033] mean value: 5.920244407653809 key: score_time value: [0.01291919 0.01340175 0.01387787 0.03123879 0.04060364 0.01863265 0.01957917 0.0157547 0.02303362 0.03177905] mean value: 0.022082042694091798 key: test_mcc value: [1. 0.95713391 0.97120941 0.92944673 0.97100831 0.98540068 0.94280904 0.94280904 0.91533482 0.94280904] mean value: 0.9557960991328401 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97810219 0.98540146 0.96350365 0.98529412 0.99264706 0.97058824 0.97058824 0.95588235 0.97058824] mean value: 0.9772595534564191 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.97841727 0.98571429 0.96503497 0.98550725 0.99270073 0.97142857 0.97142857 0.95774648 0.97142857] mean value: 0.9779406686399074 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95774648 0.97183099 0.93243243 0.97142857 0.98550725 0.94444444 0.94444444 0.91891892 0.94444444] mean value: 0.9571197967278801 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.97826087 0.98529412 0.96323529 0.98529412 0.99264706 0.97058824 0.97058824 0.95588235 0.97058824] mean value: 0.9772378516624041 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.95774648 0.97183099 0.93243243 0.97142857 0.98550725 0.94444444 0.94444444 0.91891892 0.94444444] mean value: 0.9571197967278801 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.82 Accuracy on Blind test: 0.95 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.06341195 0.12838125 0.12218761 0.11328745 0.0931344 0.10351372 0.09466314 0.09430265 0.10396051 0.13386202] mean value: 0.10507047176361084 key: score_time value: [0.02171898 0.01990557 0.02713132 0.02102256 0.02411985 0.02375531 0.02291679 0.02295637 0.02355218 0.02601242] mean value: 0.02330913543701172 key: test_mcc value: [0.87086187 0.84393916 0.8251972 0.81433714 0.75008111 0.88388348 0.71364124 0.91176471 0.8131434 0.82675403] mean value: 0.8253603346628763 key: train_mcc value: [0.8599829 0.88152381 0.86490711 0.87027085 0.87543967 0.8684941 0.88152966 0.86385236 0.87175299 0.86206788] mean value: 0.8699821323180184 key: test_accuracy value: [0.93430657 0.91970803 0.91240876 0.90510949 0.875 0.94117647 0.85294118 0.95588235 0.90441176 0.91176471] mean value: 0.9112709317303563 key: train_accuracy value: [0.92991035 0.9405053 0.93235534 0.93480033 0.93729642 0.93403909 0.94055375 0.93159609 0.93566775 0.93078176] mean value: 0.9347506165563635 key: test_fscore value: [0.93129771 0.92307692 0.91176471 0.91034483 0.87591241 0.94285714 0.8630137 0.95588235 0.90909091 0.91549296] mean value: 0.9138733636494115 key: train_fscore value: [0.93064516 0.94155324 0.93301049 0.936 0.93864542 0.93504411 0.9414595 0.93290735 0.93664796 0.93194556] mean value: 0.9357858782984593 key: test_precision value: [0.96825397 0.88 0.92537313 0.86842105 0.86956522 0.91666667 0.80769231 0.95588235 0.86666667 0.87837838] mean value: 0.8936899744950406 key: train_precision value: [0.92172524 0.92598425 0.92332268 0.91836735 0.91887676 0.92101106 0.92733017 0.9153605 0.92259084 0.91653543] mean value: 0.9211104281448699 key: test_recall value: [0.89705882 0.97058824 0.89855072 0.95652174 0.88235294 0.97058824 0.92647059 0.95588235 0.95588235 0.95588235] mean value: 0.9369778346121057 key: train_recall value: [0.93973941 0.95765472 0.94290375 0.954323 0.95928339 0.9495114 0.95602606 0.95114007 0.95114007 0.94788274] mean value: 0.9509604603833339 key: test_roc_auc value: [0.93403666 0.92007673 0.91251066 0.90473146 0.875 0.94117647 0.85294118 0.95588235 0.90441176 0.91176471] mean value: 0.9112531969309463 key: train_roc_auc value: [0.92990233 0.94049131 0.93236393 0.93481622 0.93729642 0.93403909 0.94055375 0.93159609 0.93566775 0.93078176] mean value: 0.9347508648128763 key: test_jcc value: [0.87142857 0.85714286 0.83783784 0.83544304 0.77922078 0.89189189 0.75903614 0.91549296 0.83333333 0.84415584] mean value: 0.8424983255310591 key: train_jcc value: [0.87028658 0.88956127 0.87443268 0.87969925 0.88438438 0.87801205 0.88939394 0.8742515 0.88084465 0.87256372] mean value: 0.8793430005520554 MCC on Blind test: 0.52 Accuracy on Blind test: 0.85 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01756454 0.01748395 0.01753092 0.0176425 0.01723886 0.01738024 0.01743436 0.01817703 0.01737046 0.01723433] mean value: 0.017505717277526856 key: score_time value: [0.01266932 0.01253963 0.01251888 0.01266742 0.01258802 0.01264358 0.01258373 0.01260686 0.01254725 0.01253533] mean value: 0.01259000301361084 key: test_mcc value: [0.62437433 0.57838662 0.63063055 0.5912191 0.70618786 0.63242133 0.63242133 0.66529914 0.51610295 0.66183628] mean value: 0.623887949482616 key: train_mcc value: [0.62428456 0.65029189 0.62875517 0.64013666 0.61125608 0.62447543 0.6435692 0.62899827 0.64544441 0.64994144] mean value: 0.6347153109168008 key: test_accuracy value: [0.81021898 0.78832117 0.81021898 0.79562044 0.85294118 0.81617647 0.81617647 0.83088235 0.75735294 0.83088235] mean value: 0.8108791326749678 key: train_accuracy value: [0.81173594 0.82477588 0.81418093 0.8198859 0.80537459 0.81188925 0.82166124 0.81433225 0.82247557 0.82491857] mean value: 0.817123011290481 key: test_fscore value: [0.796875 0.79432624 0.79365079 0.79710145 0.85074627 0.81751825 0.81481481 0.82170543 0.7480916 0.82962963] mean value: 0.8064459474747275 key: train_fscore value: [0.80701754 0.8206839 0.81063123 0.81659751 0.80133001 0.80733945 0.81915772 0.81125828 0.81893688 0.82333607] mean value: 0.8136288592998409 key: test_precision value: [0.85 0.76712329 0.87719298 0.79710145 0.86363636 0.8115942 0.82089552 0.86885246 0.77777778 0.8358209 ] mean value: 0.8269994940642269 key: train_precision value: [0.82847341 0.84102564 0.82571912 0.83108108 0.81833616 0.82735043 0.83082077 0.82491582 0.83559322 0.83084577] mean value: 0.8294161432878052 key: test_recall value: [0.75 0.82352941 0.72463768 0.79710145 0.83823529 0.82352941 0.80882353 0.77941176 0.72058824 0.82352941] mean value: 0.7889386189258312 key: train_recall value: [0.78664495 0.80130293 0.79608483 0.80261011 0.78501629 0.78827362 0.80781759 0.7980456 0.8029316 0.81596091] mean value: 0.7984688428245772 key: test_roc_auc value: [0.80978261 0.7885763 0.81084825 0.79560955 0.85294118 0.81617647 0.81617647 0.83088235 0.75735294 0.83088235] mean value: 0.8109228473998296 key: train_roc_auc value: [0.81175641 0.82479502 0.81416619 0.81987183 0.80537459 0.81188925 0.82166124 0.81433225 0.82247557 0.82491857] mean value: 0.8171240920129018 key: test_jcc value: [0.66233766 0.65882353 0.65789474 0.6626506 0.74025974 0.69135802 0.6875 0.69736842 0.59756098 0.70886076] mean value: 0.6764614452108327 key: train_jcc value: [0.67647059 0.69589816 0.68156425 0.69004208 0.66851595 0.67692308 0.69370629 0.68245125 0.69338959 0.69972067] mean value: 0.6858681907721815 MCC on Blind test: 0.39 Accuracy on Blind test: 0.75 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.06309485 0.04974627 0.0544436 0.06055784 0.03951502 0.0307796 0.04072857 0.043648 0.03793311 0.03963995] mean value: 0.04600868225097656 key: score_time value: [0.0201335 0.012568 0.0126636 0.01268744 0.01263571 0.01257944 0.01267719 0.01261306 0.01259804 0.01271319] mean value: 0.013386917114257813 key: test_mcc value: [0.86948194 0.81247516 0.67142918 0.87308606 0.72627304 0.77949606 0.69128005 0.89715584 0.72669793 0.76409318] mean value: 0.781146844024664 key: train_mcc value: [0.85485659 0.87290331 0.70082438 0.86315544 0.72408634 0.82740107 0.8625736 0.86858635 0.77661648 0.8250729 ] mean value: 0.8176076462340217 key: test_accuracy value: [0.93430657 0.90510949 0.81021898 0.93430657 0.85294118 0.88970588 0.83823529 0.94852941 0.84558824 0.875 ] mean value: 0.8833941605839416 key: train_accuracy value: [0.92583537 0.93643032 0.83537082 0.93154034 0.85016287 0.91368078 0.92915309 0.93403909 0.8762215 0.90798046] mean value: 0.9040414639132016 key: test_fscore value: [0.9352518 0.90780142 0.76785714 0.93793103 0.83333333 0.89051095 0.85333333 0.94890511 0.86624204 0.88590604] mean value: 0.8827072197886613 key: train_fscore value: [0.92896175 0.93617021 0.80725191 0.93192869 0.82835821 0.91325696 0.93250582 0.93514812 0.88985507 0.91432904] mean value: 0.901776576833011 key: test_precision value: [0.91549296 0.87671233 1. 0.89473684 0.96153846 0.88405797 0.7804878 0.94202899 0.76404494 0.81481481] mean value: 0.8833915110192154 key: train_precision value: [0.89205397 0.94078947 0.97241379 0.92592593 0.96943231 0.91776316 0.89037037 0.91968504 0.80156658 0.85531915] mean value: 0.9085319776343379 key: test_recall value: [0.95588235 0.94117647 0.62318841 0.98550725 0.73529412 0.89705882 0.94117647 0.95588235 1. 0.97058824] mean value: 0.9005754475703325 key: train_recall value: [0.96905537 0.93159609 0.69004894 0.93800979 0.72312704 0.90879479 0.97882736 0.95114007 1. 0.98208469] mean value: 0.9072684134735455 key: test_roc_auc value: [0.93446292 0.90537084 0.8115942 0.93393009 0.85294118 0.88970588 0.83823529 0.94852941 0.84558824 0.875 ] mean value: 0.8835358056265985 key: train_roc_auc value: [0.92580012 0.93643426 0.83525248 0.93154561 0.85016287 0.91368078 0.92915309 0.93403909 0.8762215 0.90798046] mean value: 0.9040270257344931 key: test_jcc value: [0.87837838 0.83116883 0.62318841 0.88311688 0.71428571 0.80263158 0.74418605 0.90277778 0.76404494 0.79518072] mean value: 0.7938959282695474 key: train_jcc value: [0.86734694 0.88 0.6768 0.87253414 0.70700637 0.84036145 0.87354651 0.87819549 0.80156658 0.84217877] mean value: 0.8239536247559656 MCC on Blind test: 0.58 Accuracy on Blind test: 0.88 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.04600525 0.0352726 0.05412579 0.04311061 0.04166174 0.03660035 0.03348541 0.0660789 0.03916955 0.04450202] mean value: 0.044001221656799316 key: score_time value: [0.01239896 0.01267171 0.01264596 0.01269054 0.01274228 0.01272917 0.01265216 0.01130629 0.01268458 0.01304388] mean value: 0.01255655288696289 key: test_mcc value: [0.68527704 0.34901614 0.84660737 0.65589003 0.81961843 0.82388584 0.78017138 0.85442069 0.76470588 0.75653442] mean value: 0.7336127219455257 key: train_mcc value: [0.76254001 0.4686876 0.82112674 0.64474867 0.74249993 0.87026566 0.8681346 0.87473322 0.86578708 0.83150658] mean value: 0.7750030085751844 key: test_accuracy value: [0.82481752 0.6350365 0.91970803 0.81021898 0.90441176 0.91176471 0.88970588 0.92647059 0.88235294 0.86764706] mean value: 0.8572133963074281 key: train_accuracy value: [0.87286064 0.68296659 0.90301548 0.79869601 0.86074919 0.93485342 0.93403909 0.93729642 0.93241042 0.91042345] mean value: 0.8767310699277122 key: test_fscore value: [0.78947368 0.45652174 0.92517007 0.77586207 0.896 0.91044776 0.89208633 0.92424242 0.88235294 0.88157895] mean value: 0.8333735965250287 key: train_fscore value: [0.85818182 0.53964497 0.91139241 0.75175879 0.84210526 0.93366501 0.93366093 0.93672966 0.93077565 0.91704374] mean value: 0.855495824279099 key: test_precision value: [0.97826087 0.875 0.87179487 0.95744681 0.98245614 0.92424242 0.87323944 0.953125 0.88235294 0.79761905] mean value: 0.9095537539879266 key: train_precision value: [0.97119342 0.98701299 0.83835616 0.97905759 0.97228145 0.95101351 0.93904448 0.94527363 0.95384615 0.85393258] mean value: 0.9391011973075327 key: test_recall value: [0.66176471 0.30882353 0.98550725 0.65217391 0.82352941 0.89705882 0.91176471 0.89705882 0.88235294 0.98529412] mean value: 0.8005328218243819 key: train_recall value: [0.76872964 0.3713355 0.99836868 0.61011419 0.74267101 0.91693811 0.92833876 0.92833876 0.90879479 0.99022801] mean value: 0.8163857463959487 key: test_roc_auc value: [0.82363598 0.63267263 0.91922421 0.81138107 0.90441176 0.91176471 0.88970588 0.92647059 0.88235294 0.86764706] mean value: 0.856926683716965 key: train_roc_auc value: [0.87294557 0.68322077 0.90309313 0.79854244 0.86074919 0.93485342 0.93403909 0.93729642 0.93241042 0.91042345] mean value: 0.8767573900983574 key: test_jcc value: [0.65217391 0.29577465 0.86075949 0.63380282 0.8115942 0.83561644 0.80519481 0.85915493 0.78947368 0.78823529] mean value: 0.7331780225858255 key: train_jcc value: [0.75159236 0.36952998 0.8372093 0.60225443 0.72727273 0.8755832 0.87557604 0.88098918 0.87051482 0.84679666] mean value: 0.763731869782806 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.43791914 0.44354892 0.59035397 0.4713099 0.5238173 0.48726654 0.46352696 0.45602107 0.49114156 0.51594257] mean value: 0.4880847930908203 key: score_time value: [0.02496362 0.02302623 0.02460146 0.02421665 0.02488732 0.0236969 0.03173876 0.02384472 0.02467775 0.02427077] mean value: 0.02499241828918457 key: test_mcc value: [0.94160273 0.92791659 0.91240409 0.87609014 0.91533482 0.92737353 0.90184995 0.94158382 0.89949371 0.91334626] mean value: 0.915699564376063 key: train_mcc value: [0.95970953 0.96772862 0.96266546 0.96633736 0.96951968 0.9547216 0.97086948 0.9630019 0.95617952 0.96109562] mean value: 0.9631828760865743 key: test_accuracy value: [0.97080292 0.96350365 0.95620438 0.93430657 0.95588235 0.96323529 0.94852941 0.97058824 0.94852941 0.95588235] mean value: 0.9567464577071705 key: train_accuracy value: [0.9796251 0.98370008 0.98125509 0.98288509 0.98452769 0.9771987 0.98534202 0.98127036 0.97801303 0.98045603] mean value: 0.9814273180262763 key: test_fscore value: [0.97058824 0.96402878 0.95652174 0.93877551 0.95774648 0.96240602 0.95104895 0.97101449 0.95035461 0.95714286] mean value: 0.9579627666392394 key: train_fscore value: [0.97995188 0.98392283 0.98140663 0.98315958 0.98476343 0.97749196 0.98548387 0.98155573 0.97820823 0.98064516] mean value: 0.9816589318161805 key: test_precision value: [0.97058824 0.94366197 0.95652174 0.88461538 0.91891892 0.98461538 0.90666667 0.95714286 0.91780822 0.93055556] mean value: 0.9371094932948388 key: train_precision value: [0.96524487 0.97142857 0.97275641 0.96687697 0.9699842 0.96507937 0.97603834 0.96682464 0.9696 0.97124601] mean value: 0.9695079375901355 key: test_recall value: [0.97058824 0.98529412 0.95652174 1. 1. 0.94117647 1. 0.98529412 0.98529412 0.98529412] mean value: 0.9809462915601024 key: train_recall value: [0.99511401 0.99674267 0.99021207 1. 1. 0.99022801 0.99511401 0.99674267 0.98697068 0.99022801] mean value: 0.994135213692472 key: test_roc_auc value: [0.97080136 0.96366155 0.95620205 0.93382353 0.95588235 0.96323529 0.94852941 0.97058824 0.94852941 0.95588235] mean value: 0.9567135549872123 key: train_roc_auc value: [0.97961247 0.98368944 0.98126239 0.98289902 0.98452769 0.9771987 0.98534202 0.98127036 0.97801303 0.98045603] mean value: 0.9814271139427496 key: test_jcc value: [0.94285714 0.93055556 0.91666667 0.88461538 0.91891892 0.92753623 0.90666667 0.94366197 0.90540541 0.91780822] mean value: 0.9194692163578867 key: train_jcc value: [0.96069182 0.96835443 0.96349206 0.96687697 0.9699842 0.95597484 0.97138315 0.96377953 0.95734597 0.96202532] mean value: 0.9639908297791469 MCC on Blind test: 0.65 Accuracy on Blind test: 0.91 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.27846217 0.28161573 0.26877713 0.2692008 0.27700591 0.27466035 0.27887869 0.29725051 0.26974511 0.19499779] mean value: 0.269059419631958 key: score_time value: [0.03478312 0.03430986 0.03051233 0.04022145 0.03191257 0.03114676 0.03156161 0.03014851 0.03189254 0.04394031] mean value: 0.03404290676116943 key: test_mcc value: [0.95713391 0.94323594 0.95710706 0.94318882 0.94280904 0.97100831 0.91533482 0.95681396 0.91533482 0.94280904] mean value: 0.9444775740307054 key: train_mcc value: [0.99837133 1. 1. 0.99674532 1. 0.99674796 0.99512588 0.99512588 0.99837266 0.99837266] mean value: 0.9978861698276916 key: test_accuracy value: [0.97810219 0.97080292 0.97810219 0.97080292 0.97058824 0.98529412 0.95588235 0.97794118 0.95588235 0.97058824] mean value: 0.9713986689566337 key: train_accuracy value: [0.999185 1. 1. 0.99837001 1. 0.99837134 0.997557 0.997557 0.99918567 0.99918567] mean value: 0.9989411689749369 key: test_fscore value: [0.97841727 0.97142857 0.9787234 0.97183099 0.97142857 0.98550725 0.95774648 0.97841727 0.95774648 0.97142857] mean value: 0.9722674840953918 key: train_fscore value: [0.99918633 1. 1. 0.99837134 1. 0.99837398 0.99756296 0.99756296 0.99918633 0.99918633] mean value: 0.9989430224185503 key: test_precision value: [0.95774648 0.94444444 0.95833333 0.94520548 0.94444444 0.97142857 0.91891892 0.95774648 0.91891892 0.94444444] mean value: 0.946163151313161 key: train_precision value: [0.99837398 1. 1. 0.99674797 1. 0.99675325 0.99513776 0.99513776 0.99837398 0.99837398] mean value: 0.9978898692194735 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97826087 0.97101449 0.97794118 0.97058824 0.97058824 0.98529412 0.95588235 0.97794118 0.95588235 0.97058824] mean value: 0.9713981244671782 key: train_roc_auc value: [0.99918434 1. 1. 0.99837134 1. 0.99837134 0.997557 0.997557 0.99918567 0.99918567] mean value: 0.9989412352344161 key: test_jcc value: [0.95774648 0.94444444 0.95833333 0.94520548 0.94444444 0.97142857 0.91891892 0.95774648 0.91891892 0.94444444] mean value: 0.946163151313161 key: train_jcc value: [0.99837398 1. 1. 0.99674797 1. 0.99675325 0.99513776 0.99513776 0.99837398 0.99837398] mean value: 0.9978898692194735 MCC on Blind test: 0.66 Accuracy on Blind test: 0.92 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [1.63152909 1.51536298 1.45650649 1.48660588 1.36039352 1.3729527 1.38597703 1.38487315 1.30995417 1.31962204] mean value: 1.4223777055740356 key: score_time value: [0.08072424 0.08191442 0.0774982 0.07979679 0.07489109 0.0739162 0.07431293 0.07339907 0.0698266 0.07072353] mean value: 0.07570030689239501 key: test_mcc value: [0.90025835 0.84393916 0.94160273 0.90246052 0.91533482 0.88852332 0.8722811 0.91215932 0.83666003 0.94158382] mean value: 0.8954803171637505 key: train_mcc value: [0.97232223 0.97396649 0.97235367 0.96906252 0.97070464 0.97232431 0.97558168 0.96583007 0.97394653 0.97234494] mean value: 0.9718437085256184 key: test_accuracy value: [0.94890511 0.91970803 0.97080292 0.94890511 0.95588235 0.94117647 0.93382353 0.95588235 0.91176471 0.97058824] mean value: 0.9457438814942035 key: train_accuracy value: [0.98614507 0.98696007 0.98614507 0.98451508 0.98534202 0.98615635 0.98778502 0.98289902 0.98697068 0.98615635] mean value: 0.9859074727427666 key: test_fscore value: [0.95035461 0.92307692 0.97101449 0.95172414 0.95774648 0.94444444 0.93706294 0.95652174 0.91891892 0.97101449] mean value: 0.9481879174874257 key: train_fscore value: [0.98621249 0.98703404 0.98621249 0.98456539 0.98538961 0.98619009 0.98781478 0.98296837 0.98699187 0.98621249] mean value: 0.9859591623455506 key: test_precision value: [0.91780822 0.88 0.97101449 0.90789474 0.91891892 0.89473684 0.89333333 0.94285714 0.85 0.95714286] mean value: 0.9133706543131326 key: train_precision value: [0.9822294 0.98225806 0.98064516 0.98058252 0.98220065 0.98379254 0.98541329 0.97899838 0.98538961 0.9822294 ] mean value: 0.9823739031415591 key: test_recall value: [0.98529412 0.97058824 0.97101449 1. 1. 1. 0.98529412 0.97058824 1. 0.98529412] mean value: 0.9868073316283035 key: train_recall value: [0.99022801 0.99185668 0.99184339 0.98858075 0.98859935 0.98859935 0.99022801 0.98697068 0.98859935 0.99022801] mean value: 0.9895733589810353 key: test_roc_auc value: [0.9491688 0.92007673 0.97080136 0.94852941 0.95588235 0.94117647 0.93382353 0.95588235 0.91176471 0.97058824] mean value: 0.9457693947144075 key: train_roc_auc value: [0.98614174 0.98695607 0.98614971 0.98451839 0.98534202 0.98615635 0.98778502 0.98289902 0.98697068 0.98615635] mean value: 0.9859075354294308 key: test_jcc value: [0.90540541 0.85714286 0.94366197 0.90789474 0.91891892 0.89473684 0.88157895 0.91666667 0.85 0.94366197] mean value: 0.9019668318111609 key: train_jcc value: [0.9728 0.9744 0.9728 0.9696 0.9712 0.97275641 0.97592295 0.96650718 0.97431782 0.9728 ] mean value: 0.9723104357755392 MCC on Blind test: 0.26 Accuracy on Blind test: 0.83 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [2.14314961 1.94710898 1.98164988 2.30299282 2.00202966 2.19179535 2.07759929 1.97926068 2.0706048 2.0779984 ] mean value: 2.0774189472198485 key: score_time value: [0.01415825 0.01516271 0.01362658 0.01378727 0.03376055 0.01379681 0.01373887 0.01374412 0.01389074 0.01364255] mean value: 0.015930843353271485 key: test_mcc value: [0.98550725 0.95713391 0.95710706 0.88920184 0.97100831 0.95681396 0.88852332 0.97100831 0.90184995 0.92898531] mean value: 0.9407139215816761 key: train_mcc value: [0.99350111 0.99350111 0.98865451 0.99026748 0.99350642 0.9902753 0.9902753 0.98544789 0.98705447 0.9902753 ] mean value: 0.9902758884790179 key: test_accuracy value: [0.99270073 0.97810219 0.97810219 0.94160584 0.98529412 0.97794118 0.94117647 0.98529412 0.94852941 0.96323529] mean value: 0.9691981537140404 key: train_accuracy value: [0.99674002 0.99674002 0.99429503 0.99511002 0.99674267 0.99511401 0.99511401 0.99267101 0.99348534 0.99511401] mean value: 0.9951126127919849 key: test_fscore value: [0.99270073 0.97841727 0.9787234 0.94520548 0.98550725 0.97841727 0.94444444 0.98550725 0.95104895 0.96453901] mean value: 0.9704511041347699 key: train_fscore value: [0.99675325 0.99675325 0.99432279 0.99512987 0.99675325 0.99513776 0.99513776 0.99272433 0.99352751 0.99513776] mean value: 0.995137753160077 key: test_precision value: [0.98550725 0.95774648 0.95833333 0.8961039 0.97142857 0.95774648 0.89473684 0.97142857 0.90666667 0.93150685] mean value: 0.9431204934504661 key: train_precision value: [0.99352751 0.99352751 0.98870968 0.99030695 0.99352751 0.99032258 0.99032258 0.98555377 0.98713826 0.99032258] mean value: 0.9903258926051111 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.99275362 0.97826087 0.97794118 0.94117647 0.98529412 0.97794118 0.94117647 0.98529412 0.94852941 0.96323529] mean value: 0.9691602728047741 key: train_roc_auc value: [0.99673736 0.99673736 0.99429967 0.99511401 0.99674267 0.99511401 0.99511401 0.99267101 0.99348534 0.99511401] mean value: 0.9951129437645796 key: test_jcc value: [0.98550725 0.95774648 0.95833333 0.8961039 0.97142857 0.95774648 0.89473684 0.97142857 0.90666667 0.93150685] mean value: 0.9431204934504661 key: train_jcc value: [0.99352751 0.99352751 0.98870968 0.99030695 0.99352751 0.99032258 0.99032258 0.98555377 0.98713826 0.99032258] mean value: 0.9903258926051111 MCC on Blind test: 0.69 Accuracy on Blind test: 0.92 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.11367607 0.09891033 0.09971118 0.15056086 0.10581017 0.10995078 0.12116456 0.09784818 0.09590054 0.105196 ] mean value: 0.10987286567687989 key: score_time value: [0.02419925 0.01436257 0.01867843 0.02265835 0.02866459 0.02885079 0.02220368 0.02176857 0.05660486 0.04176497] mean value: 0.02797560691833496 key: test_mcc value: [1. 0.97122151 1. 1. 1. 0.98540068 0.98540068 1. 1. 1. ] mean value: 0.9942022862523039 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.98540146 1. 1. 1. 0.99264706 0.99264706 1. 1. 1. ] mean value: 0.9970695577501073 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.98550725 1. 1. 1. 0.99270073 0.99270073 1. 1. 1. ] mean value: 0.9970908706230827 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.97142857 1. 1. 1. 0.98550725 0.98550725 1. 1. 1. ] mean value: 0.9942443064182195 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98550725 1. 1. 1. 0.99264706 0.99264706 1. 1. 1. ] mean value: 0.997080136402387 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.97142857 1. 1. 1. 0.98550725 0.98550725 1. 1. 1. ] mean value: 0.9942443064182195 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.0 Accuracy on Blind test: 0.86 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.06178665 0.08366227 0.05242634 0.06143975 0.05042601 0.04940557 0.04602766 0.0709939 0.04714084 0.06406546] mean value: 0.058737444877624514 key: score_time value: [0.04089236 0.02844954 0.0293684 0.04122448 0.0418787 0.04635525 0.04557586 0.04112935 0.03704405 0.02668762] mean value: 0.037860560417175296 key: test_mcc value: [0.87086187 0.85739162 0.85440207 0.85721269 0.82495791 0.808911 0.75665657 0.91176471 0.79967098 0.85628096] mean value: 0.8398110382588702 key: train_mcc value: [0.86319346 0.87482365 0.86512897 0.8699782 0.88172633 0.87147537 0.88972295 0.87492825 0.88788715 0.88320483] mean value: 0.8762069139371929 key: test_accuracy value: [0.93430657 0.9270073 0.9270073 0.9270073 0.91176471 0.90441176 0.875 0.95588235 0.89705882 0.92647059] mean value: 0.9185916702447402 key: train_accuracy value: [0.93154034 0.93724531 0.93235534 0.93480033 0.94055375 0.93566775 0.94462541 0.93729642 0.94381107 0.94136808] mean value: 0.9379263795863431 key: test_fscore value: [0.93129771 0.92957746 0.92647059 0.93055556 0.90909091 0.90510949 0.88275862 0.95588235 0.90277778 0.92957746] mean value: 0.9203097932842592 key: train_fscore value: [0.93214863 0.93815261 0.93333333 0.93569132 0.94164668 0.9362389 0.94551282 0.93815261 0.94448914 0.94230769] mean value: 0.9387673736356681 key: test_precision value: [0.96825397 0.89189189 0.94029851 0.89333333 0.9375 0.89855072 0.83116883 0.95588235 0.85526316 0.89189189] mean value: 0.9064034659476198 key: train_precision value: [0.92467949 0.92551506 0.9193038 0.92234548 0.92464678 0.928 0.93059937 0.92551506 0.93322734 0.92744479] mean value: 0.9261277169762157 key: test_recall value: [0.89705882 0.97058824 0.91304348 0.97101449 0.88235294 0.91176471 0.94117647 0.95588235 0.95588235 0.97058824] mean value: 0.9369352088661551 key: train_recall value: [0.93973941 0.95114007 0.94779772 0.94942904 0.95928339 0.94462541 0.96091205 0.95114007 0.95602606 0.95765472] mean value: 0.9517747926308909 key: test_roc_auc value: [0.93403666 0.9273231 0.92710997 0.92668372 0.91176471 0.90441176 0.875 0.95588235 0.89705882 0.92647059] mean value: 0.918574168797954 key: train_roc_auc value: [0.93153365 0.93723398 0.93236791 0.93481224 0.94055375 0.93566775 0.94462541 0.93729642 0.94381107 0.94136808] mean value: 0.9379270262658681 key: test_jcc value: [0.87142857 0.86842105 0.8630137 0.87012987 0.83333333 0.82666667 0.79012346 0.91549296 0.82278481 0.86842105] mean value: 0.8529815470114921 key: train_jcc value: [0.87291982 0.88350983 0.875 0.87915408 0.8897281 0.8801214 0.89665653 0.88350983 0.89481707 0.89090909] mean value: 0.8846325755943281 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.6412735 0.60913801 0.64228439 0.55539966 0.62312245 0.62011957 0.60597944 0.62014318 0.58393216 0.58312726] mean value: 0.6084519624710083 key: score_time value: [0.0293262 0.02866387 0.02711129 0.02833629 0.03939867 0.03987908 0.03436828 0.02823949 0.03884816 0.04016376] mean value: 0.03343350887298584 key: test_mcc value: [0.91392776 0.85739162 0.85440207 0.80402464 0.82495791 0.808911 0.74337629 0.92657079 0.8131434 0.85628096] mean value: 0.8402986441758847 key: train_mcc value: [0.8797564 0.87994298 0.86512897 0.8847922 0.88172633 0.87147537 0.89289191 0.87333954 0.88611102 0.88478855] mean value: 0.8799953256651453 key: test_accuracy value: [0.95620438 0.9270073 0.9270073 0.89781022 0.91176471 0.90441176 0.86764706 0.96323529 0.90441176 0.92647059] mean value: 0.918597037355088 key: train_accuracy value: [0.9396903 0.9396903 0.93235534 0.94213529 0.94055375 0.93566775 0.94625407 0.93648208 0.94299674 0.94218241] mean value: 0.9398008038461436 key: test_fscore value: [0.95454545 0.92957746 0.92647059 0.90540541 0.90909091 0.90510949 0.87671233 0.96296296 0.90909091 0.92957746] mean value: 0.9208542976726618 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:156: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:159: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.94060995 0.9408 0.93333333 0.94306335 0.94164668 0.9362389 0.9470305 0.93739968 0.94345719 0.94306335] mean value: 0.9406642939843077 key: test_precision value: [0.984375 0.89189189 0.94029851 0.84810127 0.9375 0.89855072 0.82051282 0.97014925 0.86666667 0.89189189] mean value: 0.9049938022617767 key: train_precision value: [0.92721519 0.9245283 0.9193038 0.92744479 0.92464678 0.928 0.9335443 0.92405063 0.93589744 0.92890995] mean value: 0.9273541191183816 key: test_recall value: [0.92647059 0.97058824 0.91304348 0.97101449 0.88235294 0.91176471 0.94117647 0.95588235 0.95588235 0.97058824] mean value: 0.9398763853367433 key: train_recall value: [0.95439739 0.95765472 0.94779772 0.95921697 0.95928339 0.94462541 0.96091205 0.95114007 0.95114007 0.95765472] mean value: 0.9543822499481909 key: test_roc_auc value: [0.95598892 0.9273231 0.92710997 0.89727195 0.91176471 0.90441176 0.86764706 0.96323529 0.90441176 0.92647059] mean value: 0.9185635123614664 key: train_roc_auc value: [0.93967831 0.93967565 0.93236791 0.9421492 0.94055375 0.93566775 0.94625407 0.93648208 0.94299674 0.94218241] mean value: 0.939800787497808 key: test_jcc value: [0.91304348 0.86842105 0.8630137 0.82716049 0.83333333 0.82666667 0.7804878 0.92857143 0.83333333 0.86842105] mean value: 0.8542452342764135 key: train_jcc value: [0.88787879 0.88821752 0.875 0.892261 0.8897281 0.8801214 0.89939024 0.88217523 0.89296636 0.892261 ] mean value: 0.8879999637648476 MCC on Blind test: 0.58 Accuracy on Blind test: 0.88 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.06837058 0.07014608 0.07498884 0.09877658 0.08167028 0.07799149 0.08944106 0.09302378 0.11649036 0.10551286] mean value: 0.08764119148254394 key: score_time value: [0.02288461 0.022717 0.02236319 0.02235794 0.02454734 0.02454281 0.0242734 0.02331662 0.02335644 0.03278542] mean value: 0.024314475059509278 key: test_mcc value: [0.47727273 0.91666667 0.68313005 0.45454545 0.91287093 0.54772256 0.56694671 0.56694671 0.54772256 0.2773501 ] mean value: 0.5951174460874726 key: train_mcc value: [0.84076981 0.82168025 0.82148 0.8100405 0.8014439 0.83103945 0.8014439 0.81036475 0.83103945 0.81199182] mean value: 0.8181293826538867 key: test_accuracy value: [0.73913043 0.95652174 0.81818182 0.72727273 0.95454545 0.77272727 0.77272727 0.77272727 0.77272727 0.63636364] mean value: 0.792292490118577 key: train_accuracy value: [0.91959799 0.90954774 0.91 0.905 0.9 0.915 0.9 0.905 0.915 0.905 ] mean value: 0.9084145728643216 key: test_fscore value: [0.72727273 0.95652174 0.84615385 0.72727273 0.95652174 0.76190476 0.73684211 0.8 0.7826087 0.6 ] mean value: 0.7895098341780264 key: train_fscore value: [0.91752577 0.90526316 0.90721649 0.90452261 0.89690722 0.91282051 0.89690722 0.9035533 0.91282051 0.9015544 ] mean value: 0.905909120126948 key: test_precision value: [0.72727273 1. 0.73333333 0.72727273 0.91666667 0.8 0.875 0.71428571 0.75 0.66666667] mean value: 0.7910497835497835 key: train_precision value: [0.94680851 0.94505495 0.93617021 0.90909091 0.92553191 0.93684211 0.92553191 0.91752577 0.93684211 0.93548387] mean value: 0.9314882262027278 key: test_recall value: [0.72727273 0.91666667 1. 0.72727273 1. 0.72727273 0.63636364 0.90909091 0.81818182 0.54545455] mean value: 0.8007575757575758 key: train_recall value: [0.89 0.86868687 0.88 0.9 0.87 0.89 0.87 0.89 0.89 0.87 ] mean value: 0.8818686868686869 key: test_roc_auc value: [0.73863636 0.95833333 0.81818182 0.72727273 0.95454545 0.77272727 0.77272727 0.77272727 0.77272727 0.63636364] mean value: 0.7924242424242424 key: train_roc_auc value: [0.91974747 0.90934343 0.91 0.905 0.9 0.915 0.9 0.905 0.915 0.905 ] mean value: 0.9084090909090909 key: test_jcc value: [0.57142857 0.91666667 0.73333333 0.57142857 0.91666667 0.61538462 0.58333333 0.66666667 0.64285714 0.42857143] mean value: 0.6646336996336997 key: train_jcc value: [0.84761905 0.82692308 0.83018868 0.82568807 0.81308411 0.83962264 0.81308411 0.82407407 0.83962264 0.82075472] mean value: 0.8280661175555043 MCC on Blind test: 0.49 Accuracy on Blind test: 0.83 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.98393965 1.9290812 2.16848993 1.89898634 1.75924253 1.58824778 1.68193698 1.5082283 0.945472 1.10976696] mean value: 1.6573391675949096 key: score_time value: [0.02464771 0.02251983 0.02378535 0.02463198 0.02498102 0.02502775 0.02192497 0.01491308 0.01212025 0.01495695] mean value: 0.020950889587402342 key: test_mcc value: [0.47727273 0.76764947 0.83205029 0.45454545 0.63636364 0.2773501 0.83205029 0.64715023 0.54772256 0.54772256] mean value: 0.6019877322488625 key: train_mcc value: [0.92035594 1. 0.92073688 0.89040077 1. 1. 1. 0.98 0.74014804 0.9900495 ] mean value: 0.9441691143454533 key: test_accuracy value: [0.73913043 0.86956522 0.90909091 0.72727273 0.81818182 0.63636364 0.90909091 0.81818182 0.77272727 0.77272727] mean value: 0.7972332015810276 key: train_accuracy value: [0.95979899 1. 0.96 0.945 1. 1. 1. 0.99 0.87 0.995 ] mean value: 0.9719798994974874 key: test_fscore value: [0.72727273 0.85714286 0.91666667 0.72727273 0.81818182 0.66666667 0.9 0.83333333 0.7826087 0.7826087 ] mean value: 0.8011754187841145 key: train_fscore value: [0.95918367 1. 0.95918367 0.94416244 1. 1. 1. 0.99 0.86868687 0.99502488] mean value: 0.9716241527795758 key: test_precision value: [0.72727273 1. 0.84615385 0.72727273 0.81818182 0.61538462 1. 0.76923077 0.75 0.75 ] mean value: 0.8003496503496503 key: train_precision value: [0.97916667 1. 0.97916667 0.95876289 1. 1. 1. 0.99 0.87755102 0.99009901] mean value: 0.9774746250240425 key: test_recall value: [0.72727273 0.75 1. 0.72727273 0.81818182 0.72727273 0.81818182 0.90909091 0.81818182 0.81818182] mean value: 0.8113636363636364 key: train_recall value: [0.94 1. 0.94 0.93 1. 1. 1. 0.99 0.86 1. ] mean value: 0.966 key: test_roc_auc value: [0.73863636 0.875 0.90909091 0.72727273 0.81818182 0.63636364 0.90909091 0.81818182 0.77272727 0.77272727] mean value: 0.7977272727272727 key: train_roc_auc value: [0.95989899 1. 0.96 0.945 1. 1. 1. 0.99 0.87 0.995 ] mean value: 0.971989898989899 key: test_jcc value: [0.57142857 0.75 0.84615385 0.57142857 0.69230769 0.5 0.81818182 0.71428571 0.64285714 0.64285714] mean value: 0.6749500499500499 key: train_jcc value: [0.92156863 1. 0.92156863 0.89423077 1. 1. 1. 0.98019802 0.76785714 0.99009901] mean value: 0.9475522196692843 MCC on Blind test: 0.49 Accuracy on Blind test: 0.83 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01519489 0.00943708 0.0094316 0.01037145 0.00944448 0.00915623 0.00911713 0.00927973 0.00896764 0.0090611 ] mean value: 0.009946131706237793 key: score_time value: [0.01964593 0.00931048 0.00915456 0.01004958 0.00895691 0.00901747 0.00918651 0.00874329 0.00890994 0.0088954 ] mean value: 0.010187005996704102 key: test_mcc value: [0.65909298 1. 0.2773501 0.2773501 0.68313005 0.32539569 0.48795004 0.46225016 0.29277002 0.09245003] mean value: 0.4557739170966826 key: train_mcc value: [0.59704686 0.6100504 0.61674214 0.64835272 0.64205788 0.57487842 0.67095904 0.63930569 0.59145083 0.60302269] mean value: 0.6193866676982089 key: test_accuracy value: [0.82608696 1. 0.63636364 0.63636364 0.81818182 0.63636364 0.72727273 0.72727273 0.63636364 0.54545455] mean value: 0.7189723320158102 key: train_accuracy value: [0.79396985 0.79899497 0.79 0.82 0.82 0.785 0.83 0.815 0.795 0.8 ] mean value: 0.8047964824120603 key: test_fscore value: [0.8 1. 0.6 0.6 0.77777778 0.5 0.66666667 0.7 0.55555556 0.58333333] mean value: 0.6783333333333333 key: train_fscore value: [0.77595628 0.7752809 0.74698795 0.80434783 0.8125 0.77005348 0.81318681 0.79781421 0.78756477 0.78947368] mean value: 0.7873165908746415 key: test_precision value: [0.88888889 1. 0.66666667 0.66666667 1. 0.8 0.85714286 0.77777778 0.71428571 0.53846154] mean value: 0.790989010989011 key: train_precision value: [0.85542169 0.87341772 0.93939394 0.88095238 0.84782609 0.82758621 0.90243902 0.87951807 0.8172043 0.83333333] mean value: 0.8657092753553371 key: test_recall value: [0.72727273 1. 0.54545455 0.54545455 0.63636364 0.36363636 0.54545455 0.63636364 0.45454545 0.63636364] mean value: 0.6090909090909091 key: train_recall value: [0.71 0.6969697 0.62 0.74 0.78 0.72 0.74 0.73 0.76 0.75 ] mean value: 0.7246969696969697 key: test_roc_auc value: [0.8219697 1. 0.63636364 0.63636364 0.81818182 0.63636364 0.72727273 0.72727273 0.63636364 0.54545455] mean value: 0.718560606060606 key: train_roc_auc value: [0.79439394 0.79848485 0.79 0.82 0.82 0.785 0.83 0.815 0.795 0.8 ] mean value: 0.8047878787878788 key: test_jcc value: [0.66666667 1. 0.42857143 0.42857143 0.63636364 0.33333333 0.5 0.53846154 0.38461538 0.41176471] mean value: 0.5328348122465769 key: train_jcc value: [0.63392857 0.63302752 0.59615385 0.67272727 0.68421053 0.62608696 0.68518519 0.66363636 0.64957265 0.65217391] mean value: 0.6496702807520676 MCC on Blind test: 0.44 Accuracy on Blind test: 0.83 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00917482 0.00915647 0.00912809 0.00903654 0.0090971 0.00903273 0.00915027 0.00949216 0.00917816 0.00924921] mean value: 0.009169554710388184 key: score_time value: [0.00869656 0.00861049 0.00864029 0.0086565 0.00863481 0.00874043 0.00908256 0.00869799 0.00867963 0.00878263] mean value: 0.008722186088562012 key: test_mcc value: [ 0.39393939 1. 0.46225016 0.18898224 0.46225016 0.54772256 0.2773501 0.36514837 -0.09245003 0.18257419] mean value: 0.3787767137904798 key: train_mcc value: [0.61070966 0.57109279 0.54043252 0.64205788 0.60048058 0.58292193 0.60108292 0.59360222 0.67573429 0.59675165] mean value: 0.601486644761032 key: test_accuracy value: [0.69565217 1. 0.72727273 0.59090909 0.72727273 0.77272727 0.63636364 0.68181818 0.45454545 0.59090909] mean value: 0.6877470355731226 key: train_accuracy value: [0.8040201 0.7839196 0.77 0.82 0.8 0.79 0.8 0.795 0.835 0.795 ] mean value: 0.7992939698492463 key: test_fscore value: [0.69565217 1. 0.75 0.52631579 0.7 0.76190476 0.6 0.69565217 0.4 0.57142857] mean value: 0.6700953470633104 key: train_fscore value: [0.79581152 0.77005348 0.76530612 0.8125 0.79591837 0.77894737 0.79381443 0.78306878 0.82352941 0.77837838] mean value: 0.7897327858678965 key: test_precision value: [0.66666667 1. 0.69230769 0.625 0.77777778 0.8 0.66666667 0.66666667 0.44444444 0.6 ] mean value: 0.6939529914529914 key: train_precision value: [0.83516484 0.81818182 0.78125 0.84782609 0.8125 0.82222222 0.81914894 0.83146067 0.88505747 0.84705882] mean value: 0.8299870867646693 key: test_recall value: [0.72727273 1. 0.81818182 0.45454545 0.63636364 0.72727273 0.54545455 0.72727273 0.36363636 0.54545455] mean value: 0.6545454545454545 key: train_recall value: [0.76 0.72727273 0.75 0.78 0.78 0.74 0.77 0.74 0.77 0.72 ] mean value: 0.7537272727272727 key: test_roc_auc value: [0.6969697 1. 0.72727273 0.59090909 0.72727273 0.77272727 0.63636364 0.68181818 0.45454545 0.59090909] mean value: 0.6878787878787879 key: train_roc_auc value: [0.80424242 0.78363636 0.77 0.82 0.8 0.79 0.8 0.795 0.835 0.795 ] mean value: 0.7992878787878788 key: test_jcc value: [0.53333333 1. 0.6 0.35714286 0.53846154 0.61538462 0.42857143 0.53333333 0.25 0.4 ] mean value: 0.5256227106227106 key: train_jcc value: [0.66086957 0.62608696 0.61983471 0.68421053 0.66101695 0.63793103 0.65811966 0.64347826 0.7 0.63716814] mean value: 0.6528715803016166 MCC on Blind test: 0.36 Accuracy on Blind test: 0.72 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00928378 0.00874376 0.00974417 0.00925326 0.00961709 0.00862384 0.00910091 0.00888395 0.00974298 0.00866914] mean value: 0.009166288375854491 key: score_time value: [0.01568413 0.01454878 0.01571488 0.01478529 0.01504016 0.01515985 0.01496387 0.0168469 0.01541829 0.01537824] mean value: 0.015354037284851074 key: test_mcc value: [0.17236256 0.83971912 0.46225016 0.36514837 0.2773501 0.18898224 0.46225016 0.36514837 0.64715023 0.09090909] mean value: 0.3871270410929328 key: train_mcc value: [0.54790792 0.45827063 0.5100255 0.58011603 0.55024767 0.61076393 0.56 0.53215963 0.51022966 0.56101073] mean value: 0.5420731707552674 key: test_accuracy value: [0.56521739 0.91304348 0.72727273 0.68181818 0.63636364 0.59090909 0.72727273 0.68181818 0.81818182 0.54545455] mean value: 0.6887351778656127 key: train_accuracy value: [0.77386935 0.72864322 0.755 0.79 0.775 0.805 0.78 0.765 0.755 0.78 ] mean value: 0.7707512562814071 key: test_fscore value: [0.64285714 0.90909091 0.75 0.69565217 0.6 0.64 0.7 0.66666667 0.8 0.54545455] mean value: 0.6949721437982308 key: train_fscore value: [0.77832512 0.73529412 0.75376884 0.79207921 0.77832512 0.8097561 0.78 0.77511962 0.75862069 0.78640777] mean value: 0.7747696587525694 key: test_precision value: [0.52941176 1. 0.69230769 0.66666667 0.66666667 0.57142857 0.77777778 0.7 0.88888889 0.54545455] mean value: 0.7038602573896692 key: train_precision value: [0.76699029 0.71428571 0.75757576 0.78431373 0.76699029 0.79047619 0.78 0.74311927 0.74757282 0.76415094] mean value: 0.7615474995337383 key: test_recall value: [0.81818182 0.83333333 0.81818182 0.72727273 0.54545455 0.72727273 0.63636364 0.63636364 0.72727273 0.54545455] mean value: 0.7015151515151515 key: train_recall value: [0.79 0.75757576 0.75 0.8 0.79 0.83 0.78 0.81 0.77 0.81 ] mean value: 0.7887575757575758 key: test_roc_auc value: [0.57575758 0.91666667 0.72727273 0.68181818 0.63636364 0.59090909 0.72727273 0.68181818 0.81818182 0.54545455] mean value: 0.6901515151515152 key: train_roc_auc value: [0.77378788 0.72878788 0.755 0.79 0.775 0.805 0.78 0.765 0.755 0.78 ] mean value: 0.7707575757575758 key: test_jcc value: [0.47368421 0.83333333 0.6 0.53333333 0.42857143 0.47058824 0.53846154 0.5 0.66666667 0.375 ] mean value: 0.5419638746186733 key: train_jcc value: [0.63709677 0.58139535 0.60483871 0.6557377 0.63709677 0.68032787 0.63934426 0.6328125 0.61111111 0.648 ] mean value: 0.632776105407841 MCC on Blind test: 0.28 Accuracy on Blind test: 0.71 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01423049 0.01355338 0.0119102 0.01166868 0.01288462 0.01357532 0.01195931 0.01357794 0.01178098 0.01234269] mean value: 0.012748360633850098 key: score_time value: [0.01121068 0.00992894 0.00999022 0.00975442 0.01017022 0.01056528 0.00981355 0.01052999 0.00942135 0.00952411] mean value: 0.010090875625610351 key: test_mcc value: [0.39393939 0.91666667 0.64715023 0.45454545 0.54772256 0.36514837 0.56694671 0.56694671 0.36514837 0.27272727] mean value: 0.5096941736681291 key: train_mcc value: [0.79917164 0.72948996 0.77313757 0.76015205 0.7500375 0.76244374 0.74133561 0.77034673 0.75093926 0.76015205] mean value: 0.7597206109096323 key: test_accuracy value: [0.69565217 0.95652174 0.81818182 0.72727273 0.77272727 0.68181818 0.77272727 0.77272727 0.68181818 0.63636364] mean value: 0.7515810276679842 key: train_accuracy value: [0.89949749 0.86432161 0.885 0.88 0.875 0.88 0.87 0.885 0.875 0.88 ] mean value: 0.8793819095477386 key: test_fscore value: [0.69565217 0.95652174 0.83333333 0.72727273 0.7826087 0.69565217 0.73684211 0.8 0.69565217 0.63636364] mean value: 0.7559898758754594 key: train_fscore value: [0.8989899 0.86010363 0.87958115 0.88118812 0.87437186 0.88461538 0.86597938 0.88324873 0.87804878 0.87878788] mean value: 0.8784914812172563 key: test_precision value: [0.66666667 1. 0.76923077 0.72727273 0.75 0.66666667 0.875 0.71428571 0.66666667 0.63636364] mean value: 0.7472152847152848 key: train_precision value: [0.90816327 0.88297872 0.92307692 0.87254902 0.87878788 0.85185185 0.89361702 0.89690722 0.85714286 0.8877551 ] mean value: 0.8852829858989989 key: test_recall value: [0.72727273 0.91666667 0.90909091 0.72727273 0.81818182 0.72727273 0.63636364 0.90909091 0.72727273 0.63636364] mean value: 0.7734848484848484 key: train_recall value: [0.89 0.83838384 0.84 0.89 0.87 0.92 0.84 0.87 0.9 0.87 ] mean value: 0.8728383838383839 key: test_roc_auc value: [0.6969697 0.95833333 0.81818182 0.72727273 0.77272727 0.68181818 0.77272727 0.77272727 0.68181818 0.63636364] mean value: 0.7518939393939393 key: train_roc_auc value: [0.89954545 0.86419192 0.885 0.88 0.875 0.88 0.87 0.885 0.875 0.88 ] mean value: 0.8793737373737374 key: test_jcc value: [0.53333333 0.91666667 0.71428571 0.57142857 0.64285714 0.53333333 0.58333333 0.66666667 0.53333333 0.46666667] mean value: 0.6161904761904762 key: train_jcc value: [0.81651376 0.75454545 0.78504673 0.78761062 0.77678571 0.79310345 0.76363636 0.79090909 0.7826087 0.78378378] mean value: 0.7834543660997322 MCC on Blind test: 0.43 Accuracy on Blind test: 0.78 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.97043252 0.98709679 0.94861197 1.01687264 1.09533525 0.92243099 0.88711429 0.95853901 0.8539257 0.86596727] mean value: 0.950632643699646 key: score_time value: [0.01569128 0.01483226 0.01499128 0.01794028 0.01487207 0.01535821 0.01524711 0.01560092 0.01368737 0.01333737] mean value: 0.015155816078186035 key: test_mcc value: [0.48856385 0.82575758 0.54772256 0.36514837 0.63636364 0.45454545 0.73029674 0.64715023 0.46225016 0.36514837] mean value: 0.5522946956544701 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.91304348 0.77272727 0.68181818 0.81818182 0.72727273 0.86363636 0.81818182 0.72727273 0.68181818] mean value: 0.7743083003952569 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.91666667 0.7826087 0.66666667 0.81818182 0.72727273 0.85714286 0.83333333 0.75 0.69565217] mean value: 0.7797524938829287 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.69230769 0.91666667 0.75 0.7 0.81818182 0.72727273 0.9 0.76923077 0.69230769 0.66666667] mean value: 0.7632634032634033 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.91666667 0.81818182 0.63636364 0.81818182 0.72727273 0.81818182 0.90909091 0.81818182 0.72727273] mean value: 0.8007575757575758 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.74242424 0.91287879 0.77272727 0.68181818 0.81818182 0.72727273 0.86363636 0.81818182 0.72727273 0.68181818] mean value: 0.7746212121212122 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.84615385 0.64285714 0.5 0.69230769 0.57142857 0.75 0.71428571 0.6 0.53333333] mean value: 0.64503663003663 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.49 Accuracy on Blind test: 0.78 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02525091 0.02160215 0.02029204 0.01865292 0.01829171 0.01900959 0.017452 0.01893425 0.01651978 0.0175488 ] mean value: 0.019355416297912598 key: score_time value: [0.01383924 0.01080704 0.01054907 0.01040816 0.00994539 0.00939131 0.00986552 0.0096252 0.00948071 0.0104351 ] mean value: 0.010434675216674804 key: test_mcc value: [0.56818182 0.83971912 0.54772256 0.46225016 0.54772256 0.18898224 0.81818182 0.63636364 0.63636364 0.64715023] mean value: 0.5892637775815944 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 0.91304348 0.77272727 0.72727273 0.77272727 0.59090909 0.90909091 0.81818182 0.81818182 0.81818182] mean value: 0.792292490118577 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7826087 0.90909091 0.7826087 0.75 0.7826087 0.52631579 0.90909091 0.81818182 0.81818182 0.8 ] mean value: 0.787868733097566 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 1. 0.75 0.69230769 0.75 0.625 0.90909091 0.81818182 0.81818182 0.88888889] mean value: 0.8001651126651127 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.83333333 0.81818182 0.81818182 0.81818182 0.45454545 0.90909091 0.81818182 0.81818182 0.72727273] mean value: 0.7833333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78409091 0.91666667 0.77272727 0.72727273 0.77272727 0.59090909 0.90909091 0.81818182 0.81818182 0.81818182] mean value: 0.7928030303030303 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64285714 0.83333333 0.64285714 0.6 0.64285714 0.35714286 0.83333333 0.69230769 0.69230769 0.66666667] mean value: 0.6603663003663004 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.43 Accuracy on Blind test: 0.78 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.11458826 0.10680294 0.10583305 0.1013813 0.09803462 0.09663272 0.09713149 0.10008979 0.09998441 0.10197449] mean value: 0.10224530696868897 key: score_time value: [0.01931691 0.01970983 0.0188899 0.01814747 0.01751494 0.01744604 0.01884842 0.01791477 0.01759744 0.01804161] mean value: 0.01834273338317871 key: test_mcc value: [0.47727273 0.91666667 0.73029674 0.54772256 0.81818182 0.45454545 0.75592895 0.75592895 0.64715023 0.45454545] mean value: 0.6558239543023852 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.95652174 0.86363636 0.77272727 0.90909091 0.72727273 0.86363636 0.86363636 0.81818182 0.72727273] mean value: 0.8241106719367589 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.95652174 0.86956522 0.76190476 0.90909091 0.72727273 0.88 0.88 0.83333333 0.72727273] mean value: 0.8272234142668925 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.72727273 1. 0.83333333 0.8 0.90909091 0.72727273 0.78571429 0.78571429 0.76923077 0.72727273] mean value: 0.8064901764901765 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.91666667 0.90909091 0.72727273 0.90909091 0.72727273 1. 1. 0.90909091 0.72727273] mean value: 0.8553030303030303 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73863636 0.95833333 0.86363636 0.77272727 0.90909091 0.72727273 0.86363636 0.86363636 0.81818182 0.72727273] mean value: 0.8242424242424242 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57142857 0.91666667 0.76923077 0.61538462 0.83333333 0.57142857 0.78571429 0.78571429 0.71428571 0.57142857] mean value: 0.7134615384615385 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.39 Accuracy on Blind test: 0.8 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01024938 0.00933266 0.01019812 0.01005816 0.00927067 0.01000428 0.01039481 0.01046586 0.00951385 0.01014614] mean value: 0.009963393211364746 key: score_time value: [0.00944662 0.00883985 0.00952673 0.00927114 0.00905871 0.00973105 0.0096364 0.00956321 0.009341 0.00951195] mean value: 0.009392666816711425 key: test_mcc value: [ 0.3030303 0.66414149 0.36514837 0. 0.54772256 0.2773501 0.37796447 0.36514837 0.36514837 -0.09090909] mean value: 0.31747449437593517 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.65217391 0.82608696 0.68181818 0.5 0.77272727 0.63636364 0.68181818 0.68181818 0.68181818 0.45454545] mean value: 0.6569169960474308 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.63636364 0.81818182 0.66666667 0.47619048 0.7826087 0.66666667 0.72 0.66666667 0.66666667 0.45454545] mean value: 0.6554556747600225 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.63636364 0.9 0.7 0.5 0.75 0.61538462 0.64285714 0.7 0.7 0.45454545] mean value: 0.6599150849150849 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.75 0.63636364 0.45454545 0.81818182 0.72727273 0.81818182 0.63636364 0.63636364 0.45454545] mean value: 0.6568181818181819 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65151515 0.82954545 0.68181818 0.5 0.77272727 0.63636364 0.68181818 0.68181818 0.68181818 0.45454545] mean value: 0.6571969696969697 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.46666667 0.69230769 0.5 0.3125 0.64285714 0.5 0.5625 0.5 0.5 0.29411765] mean value: 0.49709491488903257 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.39370704 1.41524959 1.35751224 1.38201571 1.38745046 1.37072587 1.37028241 1.33769202 1.36783028 1.3226316 ] mean value: 1.370509719848633 key: score_time value: [0.09871769 0.09867597 0.09723949 0.09882069 0.09869909 0.15835905 0.09611416 0.09787035 0.09561229 0.09823298] mean value: 0.10383417606353759 key: test_mcc value: [0.56818182 0.91666667 0.91287093 0.54772256 0.64715023 0.46225016 0.83205029 0.91287093 0.73029674 0.45454545] mean value: 0.6984605785378183 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 0.95652174 0.95454545 0.77272727 0.81818182 0.72727273 0.90909091 0.95454545 0.86363636 0.72727273] mean value: 0.8466403162055336 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7826087 0.95652174 0.95238095 0.7826087 0.83333333 0.7 0.9 0.95652174 0.86956522 0.72727273] mean value: 0.8460813099943535 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 1. 1. 0.75 0.76923077 0.77777778 1. 0.91666667 0.83333333 0.72727273] mean value: 0.8524281274281275 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.91666667 0.90909091 0.81818182 0.90909091 0.63636364 0.81818182 1. 0.90909091 0.72727273] mean value: 0.8462121212121212 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78409091 0.95833333 0.95454545 0.77272727 0.81818182 0.72727273 0.90909091 0.95454545 0.86363636 0.72727273] mean value: 0.8469696969696969 key: train_roc_auc value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64285714 0.91666667 0.90909091 0.64285714 0.71428571 0.53846154 0.81818182 0.91666667 0.76923077 0.57142857] mean value: 0.7439726939726939 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.88 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.87208843 0.89681029 0.89619946 0.83190942 0.91177535 0.97973609 0.93089819 0.90625811 0.86904597 0.93702197] mean value: 0.9031743288040162 key: score_time value: [0.17905688 0.20153785 0.17578483 0.1625216 0.16287255 0.22656584 0.14112735 0.2074976 0.22874093 0.134022 ] mean value: 0.18197274208068848 key: test_mcc value: [0.47727273 0.91666667 1. 0.63636364 0.81818182 0.64715023 0.83205029 0.91287093 0.73029674 0.2773501 ] mean value: 0.7248203142380238 key: train_mcc value: [0.96989899 0.95998792 0.95042779 0.95042779 0.94018806 0.9700485 0.95042779 0.95042779 0.96076892 0.94018806] mean value: 0.9542791603538276 key: test_accuracy value: [0.73913043 0.95652174 1. 0.81818182 0.90909091 0.81818182 0.90909091 0.95454545 0.86363636 0.63636364] mean value: 0.8604743083003953 key: train_accuracy value: [0.98492462 0.9798995 0.975 0.975 0.97 0.985 0.975 0.975 0.98 0.97 ] mean value: 0.9769824120603015 key: test_fscore value: [0.72727273 0.95652174 1. 0.81818182 0.90909091 0.8 0.9 0.95652174 0.86956522 0.6 ] mean value: 0.8537154150197629 key: train_fscore value: [0.98492462 0.97959184 0.97461929 0.97461929 0.96969697 0.98492462 0.97461929 0.97461929 0.97959184 0.96969697] mean value: 0.9766904016454888 key: test_precision value: [0.72727273 1. 1. 0.81818182 0.90909091 0.88888889 1. 0.91666667 0.83333333 0.66666667] mean value: 0.876010101010101 key: train_precision value: [0.98989899 0.98969072 0.98969072 0.98969072 0.97959184 0.98989899 0.98969072 0.98969072 1. 0.97959184] mean value: 0.9887435261514791 key: test_recall value: [0.72727273 0.91666667 1. 0.81818182 0.90909091 0.72727273 0.81818182 1. 0.90909091 0.54545455] mean value: 0.8371212121212122 key: train_recall value: [0.98 0.96969697 0.96 0.96 0.96 0.98 0.96 0.96 0.96 0.96 ] mean value: 0.9649696969696969 key: test_roc_auc value: [0.73863636 0.95833333 1. 0.81818182 0.90909091 0.81818182 0.90909091 0.95454545 0.86363636 0.63636364] mean value: 0.8606060606060606 key: train_roc_auc value: [0.98494949 0.97984848 0.975 0.975 0.97 0.985 0.975 0.975 0.98 0.97 ] mean value: 0.976979797979798 key: test_jcc value: [0.57142857 0.91666667 1. 0.69230769 0.83333333 0.66666667 0.81818182 0.91666667 0.76923077 0.42857143] mean value: 0.7613053613053613 key: train_jcc value: [0.97029703 0.96 0.95049505 0.95049505 0.94117647 0.97029703 0.95049505 0.95049505 0.96 0.94117647] mean value: 0.9544927198602213 MCC on Blind test: 0.55 Accuracy on Blind test: 0.86 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01043224 0.00909019 0.01290274 0.01403213 0.01533985 0.01524782 0.01515961 0.01238012 0.01442981 0.01607943] mean value: 0.013509392738342285 key: score_time value: [0.00940108 0.01727343 0.01171255 0.01217628 0.01322937 0.01409483 0.01232147 0.01480937 0.01329613 0.01417756] mean value: 0.01324920654296875 key: test_mcc value: [ 0.39393939 1. 0.46225016 0.18898224 0.46225016 0.54772256 0.2773501 0.36514837 -0.09245003 0.18257419] mean value: 0.3787767137904798 key: train_mcc value: [0.61070966 0.57109279 0.54043252 0.64205788 0.60048058 0.58292193 0.60108292 0.59360222 0.67573429 0.59675165] mean value: 0.601486644761032 key: test_accuracy value: [0.69565217 1. 0.72727273 0.59090909 0.72727273 0.77272727 0.63636364 0.68181818 0.45454545 0.59090909] mean value: 0.6877470355731226 key: train_accuracy value: [0.8040201 0.7839196 0.77 0.82 0.8 0.79 0.8 0.795 0.835 0.795 ] mean value: 0.7992939698492463 key: test_fscore value: [0.69565217 1. 0.75 0.52631579 0.7 0.76190476 0.6 0.69565217 0.4 0.57142857] mean value: 0.6700953470633104 key: train_fscore value: [0.79581152 0.77005348 0.76530612 0.8125 0.79591837 0.77894737 0.79381443 0.78306878 0.82352941 0.77837838] mean value: 0.7897327858678965 key: test_precision value: [0.66666667 1. 0.69230769 0.625 0.77777778 0.8 0.66666667 0.66666667 0.44444444 0.6 ] mean value: 0.6939529914529914 key: train_precision value: [0.83516484 0.81818182 0.78125 0.84782609 0.8125 0.82222222 0.81914894 0.83146067 0.88505747 0.84705882] mean value: 0.8299870867646693 key: test_recall value: [0.72727273 1. 0.81818182 0.45454545 0.63636364 0.72727273 0.54545455 0.72727273 0.36363636 0.54545455] mean value: 0.6545454545454545 key: train_recall value: [0.76 0.72727273 0.75 0.78 0.78 0.74 0.77 0.74 0.77 0.72 ] mean value: 0.7537272727272727 key: test_roc_auc value: [0.6969697 1. 0.72727273 0.59090909 0.72727273 0.77272727 0.63636364 0.68181818 0.45454545 0.59090909] mean value: 0.6878787878787879 key: train_roc_auc value: [0.80424242 0.78363636 0.77 0.82 0.8 0.79 0.8 0.795 0.835 0.795 ] mean value: 0.7992878787878788 key: test_jcc value: [0.53333333 1. 0.6 0.35714286 0.53846154 0.61538462 0.42857143 0.53333333 0.25 0.4 ] mean value: 0.5256227106227106 key: train_jcc value: [0.66086957 0.62608696 0.61983471 0.68421053 0.66101695 0.63793103 0.65811966 0.64347826 0.7 0.63716814] mean value: 0.6528715803016166 MCC on Blind test: 0.36 Accuracy on Blind test: 0.72 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.70163393 0.10867262 0.21027184 1.58362126 0.15635705 0.38029337 1.38273144 0.10912514 0.25329494 1.5216136 ] mean value: 0.6407615184783936 key: score_time value: [0.01133823 0.01368117 0.01305103 0.01234221 0.01410246 0.01277947 0.01145935 0.0133307 0.01315022 0.01216602] mean value: 0.012740087509155274 key: test_mcc value: [0.66414149 0.83971912 1. 0.83205029 0.73029674 0.73029674 0.83205029 0.73029674 0.73029674 0.83205029] mean value: 0.7921198467134848 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.82608696 0.91304348 1. 0.90909091 0.86363636 0.86363636 0.90909091 0.86363636 0.86363636 0.90909091] mean value: 0.892094861660079 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83333333 0.90909091 1. 0.9 0.85714286 0.85714286 0.9 0.86956522 0.86956522 0.9 ] mean value: 0.8895840391492565 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76923077 1. 1. 1. 0.9 0.9 1. 0.83333333 0.83333333 1. ] mean value: 0.9235897435897436 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.83333333 1. 0.81818182 0.81818182 0.81818182 0.81818182 0.90909091 0.90909091 0.81818182] mean value: 0.8651515151515152 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.82954545 0.91666667 1. 0.90909091 0.86363636 0.86363636 0.90909091 0.86363636 0.86363636 0.90909091] mean value: 0.8928030303030303 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.71428571 0.83333333 1. 0.81818182 0.75 0.75 0.81818182 0.76923077 0.76923077 0.81818182] mean value: 0.8040626040626041 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.6 Accuracy on Blind test: 0.86 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05875063 0.02724981 0.03360963 0.03436136 0.06647086 0.03612995 0.0561471 0.06006169 0.05103612 0.07747197] mean value: 0.05012891292572021 key: score_time value: [0.04140472 0.01212978 0.01214314 0.02116013 0.01222682 0.01217699 0.02134228 0.0206449 0.02383661 0.02042603] mean value: 0.019749140739440917 key: test_mcc value: [0.74047959 0.58930667 0.63636364 0.2773501 0.54772256 0.36514837 0.68313005 0.36514837 0.56694671 0.36514837] mean value: 0.5136744424225118 key: train_mcc value: [0.96989899 0.950172 0.9900495 0.9900495 0.98019606 0.9900495 1. 0.97043679 1. 0.98019606] mean value: 0.9821048409556434 key: test_accuracy value: [0.86956522 0.7826087 0.81818182 0.63636364 0.77272727 0.68181818 0.81818182 0.68181818 0.77272727 0.68181818] mean value: 0.7515810276679842 key: train_accuracy value: [0.98492462 0.97487437 0.995 0.995 0.99 0.995 1. 0.985 1. 0.99 ] mean value: 0.9909798994974874 key: test_fscore value: [0.85714286 0.76190476 0.81818182 0.6 0.76190476 0.66666667 0.77777778 0.69565217 0.8 0.69565217] mean value: 0.7434882991404731 key: train_fscore value: [0.98492462 0.97435897 0.99502488 0.99497487 0.98989899 0.99502488 1. 0.98477157 1. 0.98989899] mean value: 0.9908877776492233 key: test_precision value: [0.9 0.88888889 0.81818182 0.66666667 0.8 0.7 1. 0.66666667 0.71428571 0.66666667] mean value: 0.7821356421356421 key: train_precision value: [0.98989899 0.98958333 0.99009901 1. 1. 0.99009901 1. 1. 1. 1. ] mean value: 0.9959680343034304 key: test_recall value: [0.81818182 0.66666667 0.81818182 0.54545455 0.72727273 0.63636364 0.63636364 0.72727273 0.90909091 0.72727273] mean value: 0.7212121212121212 key: train_recall value: [0.98 0.95959596 1. 0.99 0.98 1. 1. 0.97 1. 0.98 ] mean value: 0.9859595959595959 key: test_roc_auc value: [0.86742424 0.78787879 0.81818182 0.63636364 0.77272727 0.68181818 0.81818182 0.68181818 0.77272727 0.68181818] mean value: 0.7518939393939393 key: train_roc_auc value: [0.98494949 0.97479798 0.995 0.995 0.99 0.995 1. 0.985 1. 0.99 ] mean value: 0.9909747474747475 key: test_jcc value: [0.75 0.61538462 0.69230769 0.42857143 0.61538462 0.5 0.63636364 0.53333333 0.66666667 0.53333333] mean value: 0.5971345321345322 key: train_jcc value: [0.97029703 0.95 0.99009901 0.99 0.98 0.99009901 1. 0.97 1. 0.98 ] mean value: 0.982049504950495 MCC on Blind test: 0.43 Accuracy on Blind test: 0.78 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02285457 0.00959587 0.01455879 0.0161109 0.01507545 0.01554322 0.01422286 0.01510739 0.01378608 0.01424479] mean value: 0.015109992027282715 key: score_time value: [0.0105598 0.00911903 0.01251507 0.01496482 0.01408434 0.01389599 0.01317477 0.01339459 0.01391745 0.01329112] mean value: 0.012891697883605956 key: test_mcc value: [0.47727273 0.91605722 0.56694671 0.46225016 0.56694671 0.54772256 0.56694671 0.54772256 0. 0.27272727] mean value: 0.49245926319015676 key: train_mcc value: [0.60824753 0.56783514 0.61076393 0.62111902 0.60048058 0.56 0.61076393 0.60012004 0.59073889 0.55024767] mean value: 0.5920316725564911 key: test_accuracy value: [0.73913043 0.95652174 0.77272727 0.72727273 0.77272727 0.77272727 0.77272727 0.77272727 0.5 0.63636364] mean value: 0.742292490118577 key: train_accuracy value: [0.8040201 0.7839196 0.805 0.81 0.8 0.78 0.805 0.8 0.795 0.775 ] mean value: 0.7957939698492462 key: test_fscore value: [0.72727273 0.96 0.8 0.7 0.73684211 0.76190476 0.73684211 0.7826087 0.47619048 0.63636364] mean value: 0.7318024507910091 key: train_fscore value: [0.80788177 0.78172589 0.8 0.81553398 0.79591837 0.78 0.8 0.7979798 0.8 0.77832512] mean value: 0.7957364930785858 key: test_precision value: [0.72727273 0.92307692 0.71428571 0.77777778 0.875 0.8 0.875 0.75 0.5 0.63636364] mean value: 0.7578776778776779 key: train_precision value: [0.7961165 0.78571429 0.82105263 0.79245283 0.8125 0.78 0.82105263 0.80612245 0.78095238 0.76699029] mean value: 0.7962954005109337 key: test_recall value: [0.72727273 1. 0.90909091 0.63636364 0.63636364 0.72727273 0.63636364 0.81818182 0.45454545 0.63636364] mean value: 0.7181818181818181 key: train_recall value: [0.82 0.77777778 0.78 0.84 0.78 0.78 0.78 0.79 0.82 0.79 ] mean value: 0.7957777777777778 key: test_roc_auc value: [0.73863636 0.95454545 0.77272727 0.72727273 0.77272727 0.77272727 0.77272727 0.77272727 0.5 0.63636364] mean value: 0.7420454545454546 key: train_roc_auc value: [0.80393939 0.78388889 0.805 0.81 0.8 0.78 0.805 0.8 0.795 0.775 ] mean value: 0.7957828282828283 key: test_jcc value: [0.57142857 0.92307692 0.66666667 0.53846154 0.58333333 0.61538462 0.58333333 0.64285714 0.3125 0.46666667] mean value: 0.5903708791208792 key: train_jcc value: [0.67768595 0.64166667 0.66666667 0.68852459 0.66101695 0.63934426 0.66666667 0.66386555 0.66666667 0.63709677] mean value: 0.6609200739103485 MCC on Blind test: 0.39 Accuracy on Blind test: 0.75 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01830077 0.01933861 0.01750112 0.01849151 0.01687765 0.01699734 0.01950169 0.01714587 0.01839066 0.01669145] mean value: 0.017923665046691895 key: score_time value: [0.01280808 0.01439786 0.0138402 0.01365709 0.01333284 0.01249266 0.01363063 0.01358247 0.01361966 0.01381898] mean value: 0.013518047332763673 key: test_mcc value: [0.65909298 0.91666667 0.61237244 0.2773501 0.68313005 0.37796447 0.37796447 0.73029674 0.75592895 0.31622777] mean value: 0.5706994635298601 key: train_mcc value: [0.69567269 0.82317181 0.63910148 0.70471677 0.66245673 0.83070192 0.67028006 0.77908775 0.83710367 0.4843221 ] mean value: 0.7126614996184624 key: test_accuracy value: [0.82608696 0.95652174 0.77272727 0.63636364 0.81818182 0.68181818 0.68181818 0.86363636 0.86363636 0.59090909] mean value: 0.7691699604743083 key: train_accuracy value: [0.82914573 0.90954774 0.79 0.835 0.805 0.91 0.81 0.88 0.915 0.69 ] mean value: 0.8373693467336684 key: test_fscore value: [0.8 0.95652174 0.81481481 0.66666667 0.77777778 0.63157895 0.72 0.85714286 0.88 0.70967742] mean value: 0.7814180222255811 key: train_fscore value: [0.79761905 0.90425532 0.82644628 0.85714286 0.75776398 0.90217391 0.84033613 0.86516854 0.92018779 0.76335878] mean value: 0.8434452638934143 key: test_precision value: [0.88888889 1. 0.6875 0.61538462 1. 0.75 0.64285714 0.9 0.78571429 0.55 ] mean value: 0.7820344932844933 key: train_precision value: [0.98529412 0.95505618 0.70422535 0.75572519 1. 0.98809524 0.72463768 0.98717949 0.86725664 0.61728395] mean value: 0.8584753834594282 key: test_recall value: [0.72727273 0.91666667 1. 0.72727273 0.63636364 0.54545455 0.81818182 0.81818182 1. 1. ] mean value: 0.818939393939394 key: train_recall value: [0.67 0.85858586 1. 0.99 0.61 0.83 1. 0.77 0.98 1. ] mean value: 0.8708585858585859 key: test_roc_auc value: [0.8219697 0.95833333 0.77272727 0.63636364 0.81818182 0.68181818 0.68181818 0.86363636 0.86363636 0.59090909] mean value: 0.7689393939393939 key: train_roc_auc value: [0.82994949 0.90929293 0.79 0.835 0.805 0.91 0.81 0.88 0.915 0.69 ] mean value: 0.8374242424242424 key: test_jcc value: [0.66666667 0.91666667 0.6875 0.5 0.63636364 0.46153846 0.5625 0.75 0.78571429 0.55 ] mean value: 0.6516949716949717 key: train_jcc value: [0.66336634 0.82524272 0.70422535 0.75 0.61 0.82178218 0.72463768 0.76237624 0.85217391 0.61728395] mean value: 0.7331088367854708 MCC on Blind test: 0.4 Accuracy on Blind test: 0.71 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01732206 0.01706791 0.01785231 0.01656723 0.01566243 0.01723099 0.01307607 0.01557326 0.01516747 0.02194858] mean value: 0.016746830940246583 key: score_time value: [0.01364207 0.01363683 0.01368928 0.01371098 0.01355767 0.01361108 0.012398 0.01225924 0.01247382 0.01716232] mean value: 0.013614130020141602 key: test_mcc value: [0.47727273 0.82575758 0.75592895 0.20412415 0.68313005 0.45454545 0.81818182 0.31622777 0.31622777 0.48795004] mean value: 0.5339346286579878 key: train_mcc value: [0.93007986 0.72825731 0.89040077 0.79676132 0.63800912 0.87354505 0.7726195 0.43643578 0.35156152 0.88070485] mean value: 0.7298375083788813 key: test_accuracy value: [0.73913043 0.91304348 0.86363636 0.59090909 0.81818182 0.72727273 0.90909091 0.59090909 0.59090909 0.72727273] mean value: 0.7470355731225297 key: train_accuracy value: [0.96482412 0.84924623 0.945 0.895 0.8 0.935 0.88 0.66 0.61 0.94 ] mean value: 0.8479070351758794 key: test_fscore value: [0.72727273 0.91666667 0.88 0.47058824 0.84615385 0.72727273 0.90909091 0.70967742 0.70967742 0.76923077] mean value: 0.7665630719691441 key: train_fscore value: [0.96446701 0.86725664 0.94581281 0.88770053 0.82905983 0.93779904 0.88990826 0.74626866 0.71942446 0.94117647] mean value: 0.8728873701624488 key: test_precision value: [0.72727273 0.91666667 0.78571429 0.66666667 0.73333333 0.72727273 0.90909091 0.55 0.55 0.66666667] mean value: 0.7232683982683983 key: train_precision value: [0.97938144 0.77165354 0.93203883 0.95402299 0.7238806 0.89908257 0.8220339 0.5952381 0.56179775 0.92307692] mean value: 0.8162206645314616 key: test_recall value: [0.72727273 0.91666667 1. 0.36363636 1. 0.72727273 0.90909091 1. 1. 0.90909091] mean value: 0.8553030303030303 key: train_recall value: [0.95 0.98989899 0.96 0.83 0.97 0.98 0.97 1. 1. 0.96 ] mean value: 0.960989898989899 key: test_roc_auc value: [0.73863636 0.91287879 0.86363636 0.59090909 0.81818182 0.72727273 0.90909091 0.59090909 0.59090909 0.72727273] mean value: 0.746969696969697 key: train_roc_auc value: [0.96489899 0.84994949 0.945 0.895 0.8 0.935 0.88 0.66 0.61 0.94 ] mean value: 0.8479848484848485 key: test_jcc value: [0.57142857 0.84615385 0.78571429 0.30769231 0.73333333 0.57142857 0.83333333 0.55 0.55 0.625 ] mean value: 0.6374084249084249 key: train_jcc value: [0.93137255 0.765625 0.89719626 0.79807692 0.7080292 0.88288288 0.80165289 0.5952381 0.56179775 0.88888889] mean value: 0.7830760443239905 MCC on Blind test: 0.39 Accuracy on Blind test: 0.8 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.14788842 0.12276125 0.11564207 0.11369634 0.17145896 0.15707755 0.17124152 0.11453056 0.11493278 0.1138854 ] mean value: 0.13431148529052733 key: score_time value: [0.01740265 0.01460624 0.01493835 0.01787996 0.02255607 0.02203274 0.01534486 0.0149529 0.01473022 0.01477933] mean value: 0.01692233085632324 key: test_mcc value: [0.82575758 0.83971912 1. 0.64715023 0.81818182 0.46225016 0.73029674 0.73029674 0.73029674 0.54772256] mean value: 0.7331671696675314 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.91304348 1. 0.81818182 0.90909091 0.72727273 0.86363636 0.86363636 0.86363636 0.77272727] mean value: 0.8644268774703557 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.90909091 1. 0.83333333 0.90909091 0.75 0.85714286 0.86956522 0.86956522 0.76190476] mean value: 0.8668784114436289 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 1. 1. 0.76923077 0.90909091 0.69230769 0.9 0.83333333 0.83333333 0.8 ] mean value: 0.8646386946386947 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.83333333 1. 0.90909091 0.90909091 0.81818182 0.81818182 0.90909091 0.90909091 0.72727273] mean value: 0.8742424242424243 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91287879 0.91666667 1. 0.81818182 0.90909091 0.72727273 0.86363636 0.86363636 0.86363636 0.77272727] mean value: 0.8647727272727272 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.83333333 1. 0.71428571 0.83333333 0.6 0.75 0.76923077 0.76923077 0.61538462] mean value: 0.7718131868131868 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.88 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.04062152 0.0381217 0.04360723 0.03369117 0.057724 0.04869604 0.0588851 0.06322312 0.04744148 0.04814744] mean value: 0.0480158805847168 key: score_time value: [0.01854205 0.01946282 0.01998758 0.02527785 0.03004384 0.02557468 0.03072119 0.02437186 0.02554321 0.02506495] mean value: 0.024459004402160645 key: test_mcc value: [0.74242424 0.76764947 0.91287093 0.73029674 0.81818182 0.45454545 0.83205029 0.54772256 0.73029674 0.75592895] mean value: 0.7291967202447438 key: train_mcc value: [0.95979798 0.96056672 0.96076892 0.9900495 0.96 0.96 0.98 0.97043679 0.96019206 0.97043679] mean value: 0.9672248773218128 key: test_accuracy value: [0.86956522 0.86956522 0.95454545 0.86363636 0.90909091 0.72727273 0.90909091 0.77272727 0.86363636 0.86363636] mean value: 0.8602766798418973 key: train_accuracy value: [0.9798995 0.9798995 0.98 0.995 0.98 0.98 0.99 0.985 0.98 0.985 ] mean value: 0.9834798994974874 key: test_fscore value: [0.86956522 0.85714286 0.95238095 0.85714286 0.90909091 0.72727273 0.9 0.7826087 0.86956522 0.84210526] mean value: 0.856687469662298 key: train_fscore value: [0.98 0.97938144 0.97959184 0.99497487 0.98 0.98 0.99 0.98477157 0.97979798 0.98477157] mean value: 0.9833289281411624 key: test_precision value: [0.83333333 1. 1. 0.9 0.90909091 0.72727273 1. 0.75 0.83333333 1. ] mean value: 0.8953030303030303 key: train_precision value: [0.98 1. 1. 1. 0.98 0.98 0.99 1. 0.98979592 1. ] mean value: 0.9919795918367347 key: test_recall value: [0.90909091 0.75 0.90909091 0.81818182 0.90909091 0.72727273 0.81818182 0.81818182 0.90909091 0.72727273] mean value: 0.8295454545454546 key: train_recall value: [0.98 0.95959596 0.96 0.99 0.98 0.98 0.99 0.97 0.97 0.97 ] mean value: 0.9749595959595959 key: test_roc_auc value: [0.87121212 0.875 0.95454545 0.86363636 0.90909091 0.72727273 0.90909091 0.77272727 0.86363636 0.86363636] mean value: 0.8609848484848485 key: train_roc_auc value: [0.97989899 0.97979798 0.98 0.995 0.98 0.98 0.99 0.985 0.98 0.985 ] mean value: 0.983469696969697 key: test_jcc value: [0.76923077 0.75 0.90909091 0.75 0.83333333 0.57142857 0.81818182 0.64285714 0.76923077 0.72727273] mean value: 0.7540626040626041 key: train_jcc value: [0.96078431 0.95959596 0.96 0.99 0.96078431 0.96078431 0.98019802 0.97 0.96039604 0.97 ] mean value: 0.9672542960178371 MCC on Blind test: 0.63 Accuracy on Blind test: 0.88 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.17059803 0.11714387 0.06824756 0.07706714 0.07887411 0.07943869 0.07819271 0.09146166 0.07394814 0.03088427] mean value: 0.08658561706542969 key: score_time value: [0.02730536 0.01919198 0.0194664 0.02080297 0.02410865 0.02245116 0.01826978 0.02425981 0.01341414 0.01849532] mean value: 0.02077655792236328 key: test_mcc value: [0.41096386 0.6992059 0.46225016 0.18898224 0.37796447 0.18257419 0.63636364 0.18257419 0.36514837 0.18257419] mean value: 0.3688601196946537 key: train_mcc value: [0.99 0.98999899 0.9900495 0.9900495 0.9900495 1. 0.9900495 0.9900495 0.9900495 0.9900495 ] mean value: 0.9910345520838135 key: test_accuracy value: [0.69565217 0.82608696 0.72727273 0.59090909 0.68181818 0.59090909 0.81818182 0.59090909 0.68181818 0.59090909] mean value: 0.6794466403162055 key: train_accuracy value: [0.99497487 0.99497487 0.995 0.995 0.995 1. 0.995 0.995 0.995 0.995 ] mean value: 0.9954949748743719 key: test_fscore value: [0.72 0.8 0.7 0.52631579 0.63157895 0.60869565 0.81818182 0.60869565 0.66666667 0.57142857] mean value: 0.6651563097466988 key: train_fscore value: [0.99497487 0.99492386 0.99497487 0.99497487 0.99497487 1. 0.99497487 0.99497487 0.99497487 0.99497487] mean value: 0.9954722852842894 key: test_precision value: [0.64285714 1. 0.77777778 0.625 0.75 0.58333333 0.81818182 0.58333333 0.7 0.6 ] mean value: 0.7080483405483405 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.66666667 0.63636364 0.45454545 0.54545455 0.63636364 0.81818182 0.63636364 0.63636364 0.54545455] mean value: 0.6393939393939394 key: train_recall value: [0.99 0.98989899 0.99 0.99 0.99 1. 0.99 0.99 0.99 0.99 ] mean value: 0.990989898989899 key: test_roc_auc value: [0.70075758 0.83333333 0.72727273 0.59090909 0.68181818 0.59090909 0.81818182 0.59090909 0.68181818 0.59090909] mean value: 0.6806818181818182 key: train_roc_auc value: [0.995 0.99494949 0.995 0.995 0.995 1. 0.995 0.995 0.995 0.995 ] mean value: 0.9954949494949495 key: test_jcc value: [0.5625 0.66666667 0.53846154 0.35714286 0.46153846 0.4375 0.69230769 0.4375 0.5 0.4 ] mean value: 0.5053617216117217 key: train_jcc value: [0.99 0.98989899 0.99 0.99 0.99 1. 0.99 0.99 0.99 0.99 ] mean value: 0.990989898989899 MCC on Blind test: 0.57 Accuracy on Blind test: 0.85 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.38503289 0.49508715 0.3759234 0.38898635 0.38849163 0.45909381 0.41848111 0.39247417 0.40462685 0.38324094] mean value: 0.40914382934570315 key: score_time value: [0.01456547 0.01054454 0.00914955 0.01094794 0.0096848 0.01354051 0.01003337 0.01007342 0.00932431 0.01131916] mean value: 0.010918307304382324 key: test_mcc value: [0.74242424 0.76764947 0.83205029 0.73029674 0.73029674 0.46225016 0.83205029 0.73029674 0.73029674 0.75592895] mean value: 0.7313540387579033 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.86956522 0.90909091 0.86363636 0.86363636 0.72727273 0.90909091 0.86363636 0.86363636 0.86363636] mean value: 0.8602766798418973 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86956522 0.85714286 0.9 0.85714286 0.85714286 0.75 0.9 0.86956522 0.86956522 0.84210526] mean value: 0.857222948676038 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 1. 1. 0.9 0.9 0.69230769 1. 0.83333333 0.83333333 1. ] mean value: 0.8992307692307693 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.75 0.81818182 0.81818182 0.81818182 0.81818182 0.81818182 0.90909091 0.90909091 0.72727273] mean value: 0.8295454545454546 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87121212 0.875 0.90909091 0.86363636 0.86363636 0.72727273 0.90909091 0.86363636 0.86363636 0.86363636] mean value: 0.8609848484848485 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.76923077 0.75 0.81818182 0.75 0.75 0.6 0.81818182 0.76923077 0.76923077 0.72727273] mean value: 0.7521328671328672 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.66 Accuracy on Blind test: 0.89 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03351307 0.04224443 0.04674196 0.04314137 0.04148889 0.04995418 0.06079268 0.07106638 0.06821966 0.14230299] mean value: 0.05994656085968018 key: score_time value: [0.0161593 0.01277709 0.01891351 0.01652718 0.01363158 0.01789951 0.01280761 0.01269722 0.02262688 0.01470828] mean value: 0.015874814987182618 key: test_mcc value: [ 0.03816905 0.25495628 -0.09090909 0. 0.10846523 0.18257419 0.2773501 -0.23570226 0.37796447 0.10846523] mean value: 0.10213331963023858 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.52173913 0.60869565 0.45454545 0.5 0.54545455 0.59090909 0.63636364 0.40909091 0.68181818 0.54545455] mean value: 0.5494071146245059 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.47619048 0.52631579 0.45454545 0.52173913 0.375 0.60869565 0.6 0.13333333 0.63157895 0.375 ] mean value: 0.4702398783520065 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.71428571 0.45454545 0.5 0.6 0.58333333 0.66666667 0.25 0.75 0.6 ] mean value: 0.5618831168831169 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.45454545 0.41666667 0.45454545 0.54545455 0.27272727 0.63636364 0.54545455 0.09090909 0.54545455 0.27272727] mean value: 0.42348484848484846 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.51893939 0.61742424 0.45454545 0.5 0.54545455 0.59090909 0.63636364 0.40909091 0.68181818 0.54545455] mean value: 0.5499999999999999 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.3125 0.35714286 0.29411765 0.35294118 0.23076923 0.4375 0.42857143 0.07142857 0.46153846 0.23076923] mean value: 0.3177278603749192 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.78 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03369212 0.02714562 0.03655577 0.0427351 0.0424602 0.03337955 0.02429128 0.01469707 0.01462412 0.01471472] mean value: 0.028429555892944335 key: score_time value: [0.03666997 0.02219319 0.03791785 0.0199151 0.03027272 0.02427435 0.0124104 0.01242089 0.01250958 0.01247907] mean value: 0.022106313705444337 key: test_mcc value: [0.47727273 0.91666667 0.54772256 0.2773501 0.73029674 0.46225016 0.73029674 0.73029674 0.64715023 0.63636364] mean value: 0.6155666308391934 key: train_mcc value: [0.91071836 0.88983239 0.92166048 0.94018806 0.89040077 0.94018806 0.90162439 0.90072087 0.90072087 0.91040978] mean value: 0.9106464009703785 key: test_accuracy value: [0.73913043 0.95652174 0.77272727 0.63636364 0.86363636 0.72727273 0.86363636 0.86363636 0.81818182 0.81818182] mean value: 0.8059288537549407 key: train_accuracy value: [0.95477387 0.94472362 0.96 0.97 0.945 0.97 0.95 0.95 0.95 0.955 ] mean value: 0.9549497487437185 key: test_fscore value: [0.72727273 0.95652174 0.7826087 0.6 0.86956522 0.7 0.85714286 0.86956522 0.83333333 0.81818182] mean value: 0.8014191605495953 key: train_fscore value: [0.95384615 0.94358974 0.95876289 0.96969697 0.94416244 0.96969697 0.94845361 0.94897959 0.94897959 0.95431472] mean value: 0.9540482672709073 key: test_precision value: [0.72727273 1. 0.75 0.66666667 0.83333333 0.77777778 0.9 0.83333333 0.76923077 0.81818182] mean value: 0.8075796425796427 key: train_precision value: [0.97894737 0.95833333 0.9893617 0.97959184 0.95876289 0.97959184 0.9787234 0.96875 0.96875 0.96907216] mean value: 0.9729884533153145 key: test_recall value: [0.72727273 0.91666667 0.81818182 0.54545455 0.90909091 0.63636364 0.81818182 0.90909091 0.90909091 0.81818182] mean value: 0.8007575757575758 /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:176: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:179: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) key: train_recall value: [0.93 0.92929293 0.93 0.96 0.93 0.96 0.92 0.93 0.93 0.94 ] mean value: 0.9359292929292929 key: test_roc_auc value: [0.73863636 0.95833333 0.77272727 0.63636364 0.86363636 0.72727273 0.86363636 0.86363636 0.81818182 0.81818182] mean value: 0.806060606060606 key: train_roc_auc value: [0.95489899 0.94464646 0.96 0.97 0.945 0.97 0.95 0.95 0.95 0.955 ] mean value: 0.9549545454545455 key: test_jcc value: [0.57142857 0.91666667 0.64285714 0.42857143 0.76923077 0.53846154 0.75 0.76923077 0.71428571 0.69230769] mean value: 0.6793040293040293 key: train_jcc value: [0.91176471 0.89320388 0.92079208 0.94117647 0.89423077 0.94117647 0.90196078 0.90291262 0.90291262 0.91262136] mean value: 0.9122751765248133 MCC on Blind test: 0.58 Accuracy on Blind test: 0.88 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.46009588 0.26499319 0.54566193 0.41685939 0.44179821 0.32088614 0.25617456 0.66312885 0.47220087 0.45935988] mean value: 0.4301158905029297 key: score_time value: [0.02258277 0.02649212 0.03821397 0.03347635 0.02990937 0.011235 0.02311802 0.02083945 0.03014112 0.02245569] mean value: 0.025846385955810548 key: test_mcc value: [0.47727273 0.91666667 0.54772256 0.2773501 0.73029674 0.46225016 0.73029674 0.73029674 0.64715023 0.63636364] mean value: 0.6155666308391934 key: train_mcc value: [0.76922303 0.88983239 0.92166048 0.96019206 0.89040077 0.94018806 0.90162439 0.90072087 0.90072087 0.91040978] mean value: 0.8984972680700725 key: test_accuracy value: [0.73913043 0.95652174 0.77272727 0.63636364 0.86363636 0.72727273 0.86363636 0.86363636 0.81818182 0.81818182] mean value: 0.8059288537549407 key: train_accuracy value: [0.88442211 0.94472362 0.96 0.98 0.945 0.97 0.95 0.95 0.95 0.955 ] mean value: 0.9489145728643216 key: test_fscore value: [0.72727273 0.95652174 0.7826087 0.6 0.86956522 0.7 0.85714286 0.86956522 0.83333333 0.81818182] mean value: 0.8014191605495953 key: train_fscore value: [0.88324873 0.94358974 0.95876289 0.97979798 0.94416244 0.96969697 0.94845361 0.94897959 0.94897959 0.95431472] mean value: 0.9479986259928396 key: test_precision value: [0.72727273 1. 0.75 0.66666667 0.83333333 0.77777778 0.9 0.83333333 0.76923077 0.81818182] mean value: 0.8075796425796427 key: train_precision value: [0.89690722 0.95833333 0.9893617 0.98979592 0.95876289 0.97959184 0.9787234 0.96875 0.96875 0.96907216] mean value: 0.965804846285959 key: test_recall value: [0.72727273 0.91666667 0.81818182 0.54545455 0.90909091 0.63636364 0.81818182 0.90909091 0.90909091 0.81818182] mean value: 0.8007575757575758 key: train_recall value: [0.87 0.92929293 0.93 0.97 0.93 0.96 0.92 0.93 0.93 0.94 ] mean value: 0.9309292929292929 key: test_roc_auc value: [0.73863636 0.95833333 0.77272727 0.63636364 0.86363636 0.72727273 0.86363636 0.86363636 0.81818182 0.81818182] mean value: 0.806060606060606 key: train_roc_auc value: [0.88449495 0.94464646 0.96 0.98 0.945 0.97 0.95 0.95 0.95 0.955 ] mean value: 0.9489141414141414 key: test_jcc value: [0.57142857 0.91666667 0.64285714 0.42857143 0.76923077 0.53846154 0.75 0.76923077 0.71428571 0.69230769] mean value: 0.6793040293040293 key: train_jcc value: [0.79090909 0.89320388 0.92079208 0.96039604 0.89423077 0.94117647 0.90196078 0.90291262 0.90291262 0.91262136] mean value: 0.9021115719290596 MCC on Blind test: 0.58 Accuracy on Blind test: 0.88 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.11089849 0.12921739 0.10791731 0.14299321 0.28428626 0.17262435 0.15614367 0.13838696 0.15337706 0.13365555] mean value: 0.15295002460479737 key: score_time value: [0.01749301 0.02020574 0.019068 0.03575182 0.02044344 0.01739883 0.02974224 0.02896857 0.0146215 0.01770163] mean value: 0.022139477729797363 key: test_mcc value: [0.812277 0.78107015 0.84016287 0.82614456 0.73529412 0.78632938 0.76603235 0.80961181 0.82352941 0.82352941] mean value: 0.8003981065072887 key: train_mcc value: [0.82559993 0.83701936 0.83389761 0.84190012 0.84039088 0.82747132 0.83230165 0.82736156 0.84202959 0.83063652] mean value: 0.8338608549501372 key: test_accuracy value: [0.90510949 0.89051095 0.91970803 0.91240876 0.86764706 0.88970588 0.88235294 0.90441176 0.91176471 0.91176471] mean value: 0.8995384285100901 key: train_accuracy value: [0.91279544 0.91850041 0.91687042 0.9209454 0.92019544 0.91368078 0.91612378 0.91368078 0.92100977 0.91530945] mean value: 0.9169111654441727 key: test_fscore value: [0.90076336 0.88888889 0.92198582 0.91549296 0.86764706 0.89655172 0.88571429 0.90225564 0.91176471 0.91176471] mean value: 0.9002829140555026 key: train_fscore value: [0.9130788 0.91830065 0.91598023 0.92068684 0.92019544 0.91297209 0.91564292 0.91368078 0.92081633 0.91558442] mean value: 0.9166938482254936 key: test_precision value: [0.93650794 0.89552239 0.90277778 0.89041096 0.86764706 0.84415584 0.86111111 0.92307692 0.91176471 0.91176471] mean value: 0.8944739410181639 key: train_precision value: [0.910859 0.92131148 0.92512479 0.92295082 0.92019544 0.9205298 0.92092257 0.91368078 0.92307692 0.91262136] mean value: 0.9191272957372615 key: test_recall value: [0.86764706 0.88235294 0.94202899 0.94202899 0.86764706 0.95588235 0.91176471 0.88235294 0.91176471 0.91176471] mean value: 0.9075234441602728 key: train_recall value: [0.91530945 0.91530945 0.90701468 0.91843393 0.92019544 0.90553746 0.91042345 0.91368078 0.91856678 0.91856678] mean value: 0.9143038189924066 key: test_roc_auc value: [0.90483802 0.89045183 0.9195439 0.91219096 0.86764706 0.88970588 0.88235294 0.90441176 0.91176471 0.91176471] mean value: 0.899467178175618 key: train_roc_auc value: [0.91279339 0.91850301 0.91686239 0.92094335 0.92019544 0.91368078 0.91612378 0.91368078 0.92100977 0.91530945] mean value: 0.9169102135596282 key: test_jcc value: [0.81944444 0.8 0.85526316 0.84415584 0.76623377 0.8125 0.79487179 0.82191781 0.83783784 0.83783784] mean value: 0.819006249149544 key: train_jcc value: [0.84005979 0.8489426 0.8449848 0.8530303 0.85218703 0.83987915 0.84441088 0.84107946 0.85325265 0.84431138] mean value: 0.8462138038269915 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [3.5434382 3.34984708 3.27759814 2.81686544 4.5030005 3.56405783 2.97054791 2.67048979 3.13669658 3.85361886] mean value: 3.3686160326004027 key: score_time value: [0.0126543 0.02569604 0.02097726 0.02703667 0.02031159 0.01380563 0.01514602 0.02036047 0.01601267 0.01590753] mean value: 0.018790817260742186 key: test_mcc value: [0.91277477 0.89869927 0.92709446 0.88654289 0.79967098 0.8623165 0.83905224 0.8722811 0.82675403 0.94280904] mean value: 0.876799528398134 key: train_mcc value: [0.89121524 0.9502579 0.89924844 0.95171372 0.95938646 0.95199397 0.90584507 0.95827005 0.92365982 0.9534379 ] mean value: 0.9345028559218945 key: test_accuracy value: [0.95620438 0.94890511 0.96350365 0.94160584 0.89705882 0.92647059 0.91911765 0.93382353 0.91176471 0.97058824] mean value: 0.9369042507513955 key: train_accuracy value: [0.94539527 0.97473513 0.94947025 0.97555012 0.97964169 0.97557003 0.95276873 0.97882736 0.96172638 0.97638436] mean value: 0.9670069341021373 key: test_fscore value: [0.95522388 0.94964029 0.96402878 0.94444444 0.90277778 0.93150685 0.92086331 0.93706294 0.91549296 0.97142857] mean value: 0.9392469792473013 key: train_fscore value: [0.94627105 0.97525938 0.95008052 0.97596154 0.97978981 0.97607656 0.95337621 0.9792 0.96212732 0.97681855] mean value: 0.967496091849667 key: test_precision value: [0.96969697 0.92957746 0.95714286 0.90666667 0.85526316 0.87179487 0.90140845 0.89333333 0.87837838 0.94444444] mean value: 0.9107706594845216 key: train_precision value: [0.93206951 0.95618153 0.93799682 0.95905512 0.97271268 0.95625 0.94126984 0.96226415 0.95215311 0.95918367] mean value: 0.9529136438683203 key: test_recall value: [0.94117647 0.97058824 0.97101449 0.98550725 0.95588235 1. 0.94117647 0.98529412 0.95588235 1. ] mean value: 0.9706521739130435 key: train_recall value: [0.96091205 0.99511401 0.96247961 0.99347471 0.98697068 0.99674267 0.96579805 0.99674267 0.9723127 0.99511401] mean value: 0.9825661163392511 key: test_roc_auc value: [0.95609548 0.94906223 0.96344842 0.94128303 0.89705882 0.92647059 0.91911765 0.93382353 0.91176471 0.97058824] mean value: 0.9368712702472294 key: train_roc_auc value: [0.94538262 0.9747185 0.94948085 0.97556472 0.97964169 0.97557003 0.95276873 0.97882736 0.96172638 0.97638436] mean value: 0.9670065252854813 key: test_jcc value: [0.91428571 0.90410959 0.93055556 0.89473684 0.82278481 0.87179487 0.85333333 0.88157895 0.84415584 0.94444444] mean value: 0.8861779952211126 key: train_jcc value: [0.89802131 0.9517134 0.90490798 0.95305164 0.96038035 0.95327103 0.9109063 0.95924765 0.92701863 0.9546875 ] mean value: 0.9373205780408035 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02296066 0.01715469 0.01691723 0.01694107 0.01650357 0.01609659 0.01617169 0.01619339 0.01607752 0.01604986] mean value: 0.01710662841796875 key: score_time value: [0.02127862 0.01170492 0.01152253 0.01181746 0.01124525 0.01138163 0.01126766 0.0113914 0.0113976 0.01129961] mean value: 0.012430667877197266 key: test_mcc value: [0.54864511 0.66746486 0.63690876 0.66496068 0.65417114 0.63406934 0.58925565 0.57408838 0.65737574 0.71081865] mean value: 0.6337758300707486 key: train_mcc value: [0.66514107 0.64307836 0.65869844 0.6519476 0.62878522 0.64676878 0.65690484 0.63901601 0.66318405 0.6395367 ] mean value: 0.649306107344115 key: test_accuracy value: [0.77372263 0.83211679 0.81751825 0.83211679 0.82352941 0.81617647 0.79411765 0.78676471 0.82352941 0.85294118] mean value: 0.8152533276084156 key: train_accuracy value: [0.83211084 0.8207009 0.82885086 0.82559087 0.80863192 0.82247557 0.82736156 0.81840391 0.83143322 0.81921824] mean value: 0.8234777893700108 key: test_fscore value: [0.76335878 0.82170543 0.81203008 0.82962963 0.80952381 0.82269504 0.78787879 0.78195489 0.80645161 0.84375 ] mean value: 0.8078978042785004 key: train_fscore value: [0.8277592 0.81418919 0.8238255 0.82107023 0.78847885 0.81556684 0.82003396 0.81053526 0.82878412 0.81375839] mean value: 0.8164001531098434 key: test_precision value: [0.79365079 0.86885246 0.84375 0.84848485 0.87931034 0.79452055 0.8125 0.8 0.89285714 0.9 ] mean value: 0.8433926136781971 key: train_precision value: [0.85051546 0.84561404 0.84801382 0.84219554 0.88128773 0.84859155 0.85638298 0.84724689 0.84201681 0.83910035] mean value: 0.850096515501237 key: test_recall value: [0.73529412 0.77941176 0.7826087 0.8115942 0.75 0.85294118 0.76470588 0.76470588 0.73529412 0.79411765] mean value: 0.7770673486786018 key: train_recall value: [0.80618893 0.78501629 0.80097879 0.80097879 0.71335505 0.78501629 0.78664495 0.77687296 0.81596091 0.78990228] mean value: 0.7860915240367499 key: test_roc_auc value: [0.77344416 0.83173487 0.81777494 0.83226769 0.82352941 0.81617647 0.79411765 0.78676471 0.82352941 0.85294118] mean value: 0.8152280477408355 key: train_roc_auc value: [0.83213198 0.82073 0.82882816 0.82557083 0.80863192 0.82247557 0.82736156 0.81840391 0.83143322 0.81921824] mean value: 0.8234785404190423 key: test_jcc value: [0.61728395 0.69736842 0.6835443 0.70886076 0.68 0.69879518 0.65 0.64197531 0.67567568 0.72972973] mean value: 0.6783233329731327 key: train_jcc value: [0.70613409 0.68660969 0.70042796 0.6964539 0.65081724 0.68857143 0.69496403 0.68142857 0.70762712 0.68599717] mean value: 0.6899031196349484 MCC on Blind test: 0.37 Accuracy on Blind test: 0.78 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02590728 0.01991701 0.01922178 0.01944113 0.01963425 0.01889205 0.01906657 0.01980186 0.019485 0.01988626] mean value: 0.020125317573547363 key: score_time value: [0.0207057 0.01319242 0.01319385 0.01324558 0.01323295 0.01321483 0.01333451 0.01359987 0.01312637 0.01353455] mean value: 0.01403806209564209 key: test_mcc value: [0.59494906 0.66746486 0.59205603 0.53314859 0.64733887 0.58430655 0.4853466 0.63296924 0.51476155 0.66183628] mean value: 0.5914177619460329 key: train_mcc value: [0.6332577 0.60556753 0.63000061 0.59274722 0.60261865 0.60755713 0.63849068 0.61889579 0.61728431 0.59446333] mean value: 0.6140882947531848 key: test_accuracy value: [0.79562044 0.83211679 0.79562044 0.76642336 0.82352941 0.78676471 0.74264706 0.81617647 0.75735294 0.83088235] mean value: 0.7947133963074281 key: train_accuracy value: [0.81662592 0.80277099 0.81499593 0.79625102 0.80130293 0.80374593 0.81921824 0.80944625 0.80863192 0.79723127] mean value: 0.8070220394012037 key: test_fscore value: [0.78125 0.82170543 0.8028169 0.76470588 0.82608696 0.80536913 0.74452555 0.82014388 0.75555556 0.83211679] mean value: 0.7954276070370564 key: train_fscore value: [0.81722177 0.80388979 0.81529699 0.79304636 0.80194805 0.80229696 0.81803279 0.8091354 0.80940795 0.79706601] mean value: 0.8067342073255446 key: test_precision value: [0.83333333 0.86885246 0.78082192 0.7761194 0.81428571 0.74074074 0.73913043 0.8028169 0.76119403 0.82608696] mean value: 0.794338189073302 key: train_precision value: [0.81523501 0.8 0.81331169 0.80504202 0.79935275 0.80826446 0.82343234 0.81045752 0.80613893 0.79771615] mean value: 0.8078950870261012 key: test_recall value: [0.73529412 0.77941176 0.82608696 0.75362319 0.83823529 0.88235294 0.75 0.83823529 0.75 0.83823529] mean value: 0.799147485080989 key: train_recall value: [0.81921824 0.80781759 0.81729201 0.78140294 0.80456026 0.79641694 0.81270358 0.80781759 0.81270358 0.79641694] mean value: 0.8056349666030788 key: test_roc_auc value: [0.79518329 0.83173487 0.79539642 0.76651748 0.82352941 0.78676471 0.74264706 0.81617647 0.75735294 0.83088235] mean value: 0.7946184995737425 key: train_roc_auc value: [0.8166238 0.80276687 0.81499779 0.79623893 0.80130293 0.80374593 0.81921824 0.80944625 0.80863192 0.79723127] mean value: 0.8070203941740042 key: test_jcc value: [0.64102564 0.69736842 0.67058824 0.61904762 0.7037037 0.6741573 0.59302326 0.69512195 0.60714286 0.7125 ] mean value: 0.6613678987670822 key: train_jcc value: [0.69093407 0.67208672 0.68818681 0.65706447 0.66937669 0.66986301 0.69209431 0.67945205 0.67983651 0.66260163] mean value: 0.676149628585884 MCC on Blind test: 0.34 Accuracy on Blind test: 0.71 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.0179677 0.01519823 0.01462317 0.01480222 0.01445556 0.01456404 0.01472354 0.01493502 0.01904297 0.01459312] mean value: 0.015490555763244629 key: score_time value: [0.04223895 0.02163744 0.03045964 0.02881622 0.02916098 0.03041983 0.03320718 0.03256726 0.03657246 0.03049612] mean value: 0.03155760765075684 key: test_mcc value: [0.73634276 0.74250909 0.79104463 0.7744776 0.71364124 0.81150267 0.75139136 0.8131434 0.83258145 0.79405762] mean value: 0.7760691817616058 key: train_mcc value: [0.86063511 0.85844508 0.85883488 0.85524077 0.86229972 0.85903147 0.86695884 0.85204744 0.86258777 0.8582323 ] mean value: 0.8594313394535598 key: test_accuracy value: [0.86131387 0.86861314 0.89051095 0.88321168 0.85294118 0.89705882 0.86764706 0.90441176 0.91176471 0.88970588] mean value: 0.8827179046801202 key: train_accuracy value: [0.92665037 0.92583537 0.92583537 0.92339038 0.92833876 0.92508143 0.92996743 0.9218241 0.92752443 0.92508143] mean value: 0.925952908101909 key: test_fscore value: [0.87248322 0.875 0.89932886 0.89189189 0.8630137 0.90666667 0.88 0.90909091 0.91780822 0.89932886] mean value: 0.8914612325055002 key: train_fscore value: [0.93119266 0.9302682 0.9302682 0.92835366 0.93220339 0.93009119 0.93415008 0.92694064 0.93200917 0.92987805] mean value: 0.9305355224718177 key: test_precision value: [0.80246914 0.82894737 0.8375 0.83544304 0.80769231 0.82926829 0.80487805 0.86666667 0.85897436 0.82716049] mean value: 0.8298999710822114 key: train_precision value: [0.87752161 0.87843705 0.87716763 0.87124464 0.88450292 0.87179487 0.88150289 0.87 0.87769784 0.8739255 ] mean value: 0.8763794955944838 key: test_recall value: [0.95588235 0.92647059 0.97101449 0.95652174 0.92647059 1. 0.97058824 0.95588235 0.98529412 0.98529412] mean value: 0.9633418584825235 key: train_recall value: [0.99185668 0.98859935 0.99021207 0.99347471 0.98534202 0.99674267 0.99348534 0.99185668 0.99348534 0.99348534] mean value: 0.991854020649234 key: test_roc_auc value: [0.86199915 0.8690324 0.88991901 0.88267263 0.85294118 0.89705882 0.86764706 0.90441176 0.91176471 0.88970588] mean value: 0.8827152600170503 key: train_roc_auc value: [0.92659718 0.92578418 0.92588779 0.92344745 0.92833876 0.92508143 0.92996743 0.9218241 0.92752443 0.92508143] mean value: 0.9259534196640646 key: test_jcc value: [0.77380952 0.77777778 0.81707317 0.80487805 0.75903614 0.82926829 0.78571429 0.83333333 0.84810127 0.81707317] mean value: 0.8046065013962848 key: train_jcc value: [0.87124464 0.86962751 0.86962751 0.86628734 0.87301587 0.86931818 0.87643678 0.86382979 0.87267525 0.86894587] mean value: 0.8701008732472146 MCC on Blind test: 0.36 Accuracy on Blind test: 0.85 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.09757638 0.09589529 0.09379077 0.09431267 0.08427525 0.09253359 0.09280396 0.0933609 0.09143758 0.09396386] mean value: 0.092995023727417 key: score_time value: [0.03198242 0.03211188 0.03194284 0.03166461 0.03045869 0.03099656 0.0306046 0.03076148 0.03069448 0.03143573] mean value: 0.03126533031463623 key: test_mcc value: [0.76762243 0.79590547 0.75369214 0.78496269 0.76503685 0.78357455 0.72254413 0.79446135 0.75073095 0.86774089] mean value: 0.7786271446200027 key: train_mcc value: [0.84380437 0.83865748 0.84037446 0.83212135 0.85204583 0.83555919 0.82737912 0.83582531 0.84856234 0.82578657] mean value: 0.8380116016387307 key: test_accuracy value: [0.88321168 0.89781022 0.87591241 0.89051095 0.88235294 0.88970588 0.86029412 0.89705882 0.875 0.93382353] mean value: 0.88856805495921 key: train_accuracy value: [0.92176039 0.9193154 0.9201304 0.91605542 0.92589577 0.91775244 0.91368078 0.91775244 0.9242671 0.91286645] mean value: 0.9189476597405286 key: test_fscore value: [0.87878788 0.89552239 0.88111888 0.89655172 0.88405797 0.8951049 0.86524823 0.89552239 0.87769784 0.93430657] mean value: 0.890391876430352 key: train_fscore value: [0.92282958 0.91970803 0.92071197 0.91619203 0.92679002 0.91821862 0.91396104 0.9188755 0.92457421 0.91336032] mean value: 0.9195221333056501 key: test_precision value: [0.90625 0.90909091 0.85135135 0.85526316 0.87142857 0.85333333 0.83561644 0.90909091 0.85915493 0.92753623] mean value: 0.8778115832007498 key: train_precision value: [0.91111111 0.91599354 0.91332263 0.91396104 0.91573927 0.91304348 0.91100324 0.90649762 0.92084006 0.90821256] mean value: 0.9129724551475382 key: test_recall value: [0.85294118 0.88235294 0.91304348 0.94202899 0.89705882 0.94117647 0.89705882 0.88235294 0.89705882 0.94117647] mean value: 0.9046248934356351 key: train_recall value: [0.93485342 0.92345277 0.92822186 0.91843393 0.93811075 0.92345277 0.91693811 0.93159609 0.92833876 0.91856678] mean value: 0.9261965237444936 key: test_roc_auc value: [0.88299233 0.89769821 0.87563939 0.89013214 0.88235294 0.88970588 0.86029412 0.89705882 0.875 0.93382353] mean value: 0.8884697357203751 key: train_roc_auc value: [0.92174971 0.91931203 0.92013699 0.91605736 0.92589577 0.91775244 0.91368078 0.91775244 0.9242671 0.91286645] mean value: 0.9189471069285992 key: test_jcc value: [0.78378378 0.81081081 0.7875 0.8125 0.79220779 0.81012658 0.7625 0.81081081 0.78205128 0.87671233] mean value: 0.8029003390710084 key: train_jcc value: [0.85671642 0.85135135 0.85307346 0.84534535 0.86356822 0.8488024 0.84155456 0.84992571 0.85972851 0.84053651] mean value: 0.8510602473270432 MCC on Blind test: 0.52 Accuracy on Blind test: 0.85 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [ 9.81939363 11.19015908 13.26645947 11.72809553 6.11142302 5.9774859 11.22838759 8.03849244 6.38793683 2.85526633] mean value: 8.660309982299804 key: score_time value: [0.01772332 0.02231932 0.02762604 0.01474333 0.01319838 0.02981472 0.01590967 0.01788568 0.0153389 0.0166378 ] mean value: 0.01911971569061279 key: test_mcc value: [0.98550725 0.88938138 0.9158731 0.92944673 0.84942274 0.92898531 0.92898531 0.97100831 0.94280904 0.95681396] mean value: 0.9298233117785435 key: train_mcc value: [0.9967453 0.93067304 0.99350118 0.99837134 0.99512588 0.98224233 0.99837266 0.99674796 0.99837266 0.96007955] mean value: 0.9850231903911716 key: test_accuracy value: [0.99270073 0.94160584 0.95620438 0.96350365 0.91911765 0.96323529 0.96323529 0.98529412 0.97058824 0.97794118] mean value: 0.9633426363246028 key: train_accuracy value: [0.99837001 0.96414018 0.99674002 0.999185 0.997557 0.99104235 0.99918567 0.99837134 0.99918567 0.97964169] mean value: 0.992341892117901 key: test_fscore value: [0.99270073 0.94444444 0.95833333 0.96503497 0.92517007 0.96453901 0.96453901 0.98550725 0.97142857 0.97841727] mean value: 0.9650114638943791 key: train_fscore value: [0.99837398 0.96540881 0.99674797 0.999185 0.99756296 0.99112187 0.99918633 0.99837398 0.99918633 0.98004789] mean value: 0.9925195119264727 key: test_precision value: [0.98550725 0.89473684 0.92 0.93243243 0.86075949 0.93150685 0.93150685 0.97142857 0.94444444 0.95774648] mean value: 0.9330069207961785 key: train_precision value: [0.99675325 0.9331307 0.99351702 0.99837134 0.99513776 0.9824 0.99837398 0.99675325 0.99837398 0.96087637] mean value: 0.9853687646105626 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.99275362 0.94202899 0.95588235 0.96323529 0.91911765 0.96323529 0.96323529 0.98529412 0.97058824 0.97794118] mean value: 0.9633312020460358 key: train_roc_auc value: [0.99836868 0.96411093 0.99674267 0.99918567 0.997557 0.99104235 0.99918567 0.99837134 0.99918567 0.97964169] mean value: 0.9923391660600135 key: test_jcc value: [0.98550725 0.89473684 0.92 0.93243243 0.86075949 0.93150685 0.93150685 0.97142857 0.94444444 0.95774648] mean value: 0.9330069207961785 key: train_jcc value: [0.99675325 0.9331307 0.99351702 0.99837134 0.99513776 0.9824 0.99837398 0.99675325 0.99837398 0.96087637] mean value: 0.9853687646105626 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.07515264 0.06080317 0.06919646 0.07247376 0.07183075 0.0628655 0.06409168 0.06514215 0.06302285 0.06468797] mean value: 0.0669266939163208 key: score_time value: [0.01082349 0.01040816 0.01027727 0.01023531 0.01051998 0.01028347 0.01050258 0.0106225 0.01002073 0.0105207 ] mean value: 0.010421419143676757 key: test_mcc value: [1. 0.87631485 0.9158731 0.88920184 0.90184995 0.94280904 0.91533482 0.95681396 0.95681396 0.91533482] mean value: 0.9270346350821976 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93430657 0.95620438 0.94160584 0.94852941 0.97058824 0.95588235 0.97794118 0.97794118 0.95588235] mean value: 0.9618881494203521 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93793103 0.95833333 0.94520548 0.95104895 0.97142857 0.95774648 0.97841727 0.97841727 0.95774648] mean value: 0.9636274859866248 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88311688 0.92 0.8961039 0.90666667 0.94444444 0.91891892 0.95774648 0.95774648 0.91891892] mean value: 0.9303662685916207 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.93478261 0.95588235 0.94117647 0.94852941 0.97058824 0.95588235 0.97794118 0.97794118 0.95588235] mean value: 0.9618606138107417 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.88311688 0.92 0.8961039 0.90666667 0.94444444 0.91891892 0.95774648 0.95774648 0.91891892] mean value: 0.9303662685916207 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.61 Accuracy on Blind test: 0.91 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.20083737 0.19879031 0.20903301 0.19165325 0.2012167 0.19774413 0.18660641 0.18971586 0.2091291 0.19987702] mean value: 0.19846031665802003 key: score_time value: [0.02021098 0.02027822 0.02088189 0.02075434 0.01990175 0.02190304 0.02135897 0.02068901 0.02141094 0.02544594] mean value: 0.021283507347106934 key: test_mcc value: [1. 0.98550725 1. 0.97120941 0.97100831 0.98540068 0.98540068 0.98540068 0.97100831 1. ] mean value: 0.9854935311865046 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.99270073 1. 0.98540146 0.98529412 0.99264706 0.99264706 0.99264706 0.98529412 1. ] mean value: 0.9926631601545728 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.99270073 1. 0.98571429 0.98550725 0.99270073 0.99270073 0.99270073 0.98550725 1. ] mean value: 0.9927531698175939 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.98550725 1. 0.97183099 0.97142857 0.98550725 0.98550725 0.98550725 0.97142857 1. ] mean value: 0.9856717114279883 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.99275362 1. 0.98529412 0.98529412 0.99264706 0.99264706 0.99264706 0.98529412 1. ] mean value: 0.992657715260017 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.98550725 1. 0.97183099 0.97142857 0.98550725 0.98550725 0.98550725 0.97142857 1. ] mean value: 0.9856717114279883 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.39 Accuracy on Blind test: 0.88 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.02046251 0.02154493 0.02026796 0.02045822 0.01979375 0.02021337 0.02100277 0.02029634 0.02117229 0.02066207] mean value: 0.02058742046356201 key: score_time value: [0.01327848 0.01783276 0.01335645 0.01328135 0.01373529 0.01361465 0.01344752 0.01323819 0.01321507 0.01345181] mean value: 0.013845157623291016 key: test_mcc value: [0.94323594 0.94323594 0.90246052 0.87609014 0.91533482 0.83666003 0.92898531 0.90184995 0.92898531 0.94280904] mean value: 0.9119646995632092 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97080292 0.97080292 0.94890511 0.93430657 0.95588235 0.91176471 0.96323529 0.94852941 0.96323529 0.97058824] mean value: 0.9538052812365823 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97142857 0.97142857 0.95172414 0.93877551 0.95774648 0.91891892 0.96453901 0.95104895 0.96453901 0.97142857] mean value: 0.9561577725446336 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94444444 0.94444444 0.90789474 0.88461538 0.91891892 0.85 0.93150685 0.90666667 0.93150685 0.94444444] mean value: 0.9164442739006545 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97101449 0.97101449 0.94852941 0.93382353 0.95588235 0.91176471 0.96323529 0.94852941 0.96323529 0.97058824] mean value: 0.9537617220801364 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94444444 0.94444444 0.90789474 0.88461538 0.91891892 0.85 0.93150685 0.90666667 0.93150685 0.94444444] mean value: 0.9164442739006545 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.75 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [8.2776711 4.21938848 4.72391248 4.15709138 4.41321802 4.4025979 4.49912047 4.87815547 4.51623201 3.0488472 ] mean value: 4.713623452186584 key: score_time value: [0.28151059 0.13990307 0.17840743 0.16440082 0.13978744 0.13792515 0.16863894 0.14216638 0.10375071 0.11206579] mean value: 0.15685563087463378 key: test_mcc value: [1. 0.95713391 0.97120941 0.95710706 0.94280904 0.97100831 0.98540068 0.97100831 0.98540068 1. ] mean value: 0.9741077402498759 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97810219 0.98540146 0.97810219 0.97058824 0.98529412 0.99264706 0.98529412 0.99264706 1. ] mean value: 0.9868076427651353 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.97841727 0.98571429 0.9787234 0.97142857 0.98550725 0.99270073 0.98550725 0.99270073 1. ] mean value: 0.9870699480192865 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95774648 0.97183099 0.95833333 0.94444444 0.97142857 0.98550725 0.97142857 0.98550725 1. ] mean value: 0.9746226878177277 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.97826087 0.98529412 0.97794118 0.97058824 0.98529412 0.99264706 0.98529412 0.99264706 1. ] mean value: 0.9867966751918158 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.95774648 0.97183099 0.95833333 0.94444444 0.97142857 0.98550725 0.97142857 0.98550725 1. ] mean value: 0.9746226878177277 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.48 Accuracy on Blind test: 0.88 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.30284762 1.3088181 1.33434439 1.30789351 1.41369963 1.26666975 1.26418257 1.34297037 1.33749843 1.32313275] mean value: 1.3202057123184203 key: score_time value: [0.24020386 0.2401464 0.17477727 0.19462323 0.14892769 0.15167356 0.25871444 0.14097333 0.13530946 0.15493202] mean value: 0.18402812480926514 key: test_mcc value: [0.98550418 0.92710997 0.94199209 0.94318882 0.89715584 0.94280904 0.97100831 0.95598573 0.97100831 0.98540068] mean value: 0.9521162979128164 key: train_mcc value: [0.98536269 0.98370525 0.98374725 0.99188303 0.98371336 0.98373423 0.9902753 0.98537469 0.9886636 0.98373423] mean value: 0.9860193630081547 key: test_accuracy value: [0.99270073 0.96350365 0.97080292 0.97080292 0.94852941 0.97058824 0.98529412 0.97794118 0.98529412 0.99264706] mean value: 0.9758104336625161 key: train_accuracy value: [0.99266504 0.99185004 0.99185004 0.99592502 0.99185668 0.99185668 0.99511401 0.99267101 0.99429967 0.99185668] mean value: 0.9929944861676343 key: test_fscore value: [0.99259259 0.96350365 0.97142857 0.97183099 0.94890511 0.97142857 0.98550725 0.97810219 0.98550725 0.99270073] mean value: 0.9761506892950969 key: train_fscore value: [0.99270073 0.99186992 0.99188312 0.99593826 0.99185668 0.99188312 0.99513776 0.99270073 0.99433198 0.99188312] mean value: 0.9930185415479755 key: test_precision value: [1. 0.95652174 0.95774648 0.94520548 0.94202899 0.94444444 0.97142857 0.97101449 0.97142857 0.98550725] mean value: 0.9645326009394998 key: train_precision value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.98869144 0.99025974 0.98707593 0.99190939 0.99185668 0.98867314 0.99032258 0.98869144 0.98872786 0.98867314] mean value: 0.9894881324676252 key: test_recall value: [0.98529412 0.97058824 0.98550725 1. 0.95588235 1. 1. 0.98529412 1. 1. ] mean value: 0.9882566069906223 key: train_recall value: [0.99674267 0.99348534 0.99673736 1. 0.99185668 0.99511401 1. 0.99674267 1. 0.99511401] mean value: 0.9965792731852214 key: test_roc_auc value: [0.99264706 0.96355499 0.9706948 0.97058824 0.94852941 0.97058824 0.98529412 0.97794118 0.98529412 0.99264706] mean value: 0.9757779198635976 key: train_roc_auc value: [0.99266171 0.99184871 0.99185402 0.99592834 0.99185668 0.99185668 0.99511401 0.99267101 0.99429967 0.99185668] mean value: 0.9929947500146128 key: test_jcc value: [0.98529412 0.92957746 0.94444444 0.94520548 0.90277778 0.94444444 0.97142857 0.95714286 0.97142857 0.98550725] mean value: 0.9537250974931324 key: train_jcc value: [0.98550725 0.98387097 0.98389694 0.99190939 0.98384491 0.98389694 0.99032258 0.98550725 0.98872786 0.98389694] mean value: 0.9861381016950115 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.04272056 0.05943704 0.05153179 0.02936983 0.01935029 0.01928663 0.01935291 0.01940179 0.01969457 0.02082396] mean value: 0.030096936225891113 key: score_time value: [0.0132606 0.01356125 0.02673125 0.01406693 0.01334238 0.01322699 0.01327682 0.01417208 0.01328945 0.0136919 ] mean value: 0.014861965179443359 key: test_mcc value: [0.59494906 0.66746486 0.59205603 0.53314859 0.64733887 0.58430655 0.4853466 0.63296924 0.51476155 0.66183628] mean value: 0.5914177619460329 key: train_mcc value: [0.6332577 0.60556753 0.63000061 0.59274722 0.60261865 0.60755713 0.63849068 0.61889579 0.61728431 0.59446333] mean value: 0.6140882947531848 key: test_accuracy value: [0.79562044 0.83211679 0.79562044 0.76642336 0.82352941 0.78676471 0.74264706 0.81617647 0.75735294 0.83088235] mean value: 0.7947133963074281 key: train_accuracy value: [0.81662592 0.80277099 0.81499593 0.79625102 0.80130293 0.80374593 0.81921824 0.80944625 0.80863192 0.79723127] mean value: 0.8070220394012037 key: test_fscore value: [0.78125 0.82170543 0.8028169 0.76470588 0.82608696 0.80536913 0.74452555 0.82014388 0.75555556 0.83211679] mean value: 0.7954276070370564 key: train_fscore value: [0.81722177 0.80388979 0.81529699 0.79304636 0.80194805 0.80229696 0.81803279 0.8091354 0.80940795 0.79706601] mean value: 0.8067342073255446 key: test_precision value: [0.83333333 0.86885246 0.78082192 0.7761194 0.81428571 0.74074074 0.73913043 0.8028169 0.76119403 0.82608696] mean value: 0.794338189073302 key: train_precision value: [0.81523501 0.8 0.81331169 0.80504202 0.79935275 0.80826446 0.82343234 0.81045752 0.80613893 0.79771615] mean value: 0.8078950870261012 key: test_recall value: [0.73529412 0.77941176 0.82608696 0.75362319 0.83823529 0.88235294 0.75 0.83823529 0.75 0.83823529] mean value: 0.799147485080989 key: train_recall value: [0.81921824 0.80781759 0.81729201 0.78140294 0.80456026 0.79641694 0.81270358 0.80781759 0.81270358 0.79641694] mean value: 0.8056349666030788 key: test_roc_auc value: [0.79518329 0.83173487 0.79539642 0.76651748 0.82352941 0.78676471 0.74264706 0.81617647 0.75735294 0.83088235] mean value: 0.7946184995737425 key: train_roc_auc value: [0.8166238 0.80276687 0.81499779 0.79623893 0.80130293 0.80374593 0.81921824 0.80944625 0.80863192 0.79723127] mean value: 0.8070203941740042 key: test_jcc value: [0.64102564 0.69736842 0.67058824 0.61904762 0.7037037 0.6741573 0.59302326 0.69512195 0.60714286 0.7125 ] mean value: 0.6613678987670822 key: train_jcc value: [0.69093407 0.67208672 0.68818681 0.65706447 0.66937669 0.66986301 0.69209431 0.67945205 0.67983651 0.66260163] mean value: 0.676149628585884 MCC on Blind test: 0.34 Accuracy on Blind test: 0.71 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.3358233 1.51815987 1.29080606 0.34695745 1.41464186 0.21330214 1.50222182 1.55371165 0.99084353 0.60692477] mean value: 0.977339243888855 key: score_time value: [0.01254296 0.01423478 0.01312089 0.01385069 0.01297879 0.01231813 0.01395893 0.01350212 0.02170682 0.01221442] mean value: 0.014042854309082031 key: test_mcc value: [1. 0.97122151 0.95710706 0.94318882 0.94280904 0.95681396 0.97100831 0.98540068 0.98540068 0.98540068] mean value: 0.9698350737002243 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.98540146 0.97810219 0.97080292 0.97058824 0.97794118 0.98529412 0.99264706 0.99264706 0.99264706] mean value: 0.9846071275225419 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.98550725 0.9787234 0.97183099 0.97142857 0.97841727 0.98550725 0.99270073 0.99270073 0.99270073] mean value: 0.9849516910321079 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.97142857 0.95833333 0.94520548 0.94444444 0.95774648 0.97142857 0.98550725 0.98550725 0.98550725] mean value: 0.970510861809065 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98550725 0.97794118 0.97058824 0.97058824 0.97794118 0.98529412 0.99264706 0.99264706 0.99264706] mean value: 0.9845801364023871 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.97142857 0.95833333 0.94520548 0.94444444 0.95774648 0.97142857 0.98550725 0.98550725 0.98550725] mean value: 0.970510861809065 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.82 Accuracy on Blind test: 0.95 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.06717873 0.10198259 0.08658075 0.10811377 0.61797523 0.07925892 0.11409497 0.10496569 0.09061217 0.10234213] mean value: 0.14731049537658691 key: score_time value: [0.02386189 0.02009273 0.02184725 0.02548599 0.02336287 0.03601003 0.02148342 0.01982164 0.01296329 0.02166152] mean value: 0.0226590633392334 key: test_mcc value: [0.812277 0.87099729 0.8251972 0.800926 0.70710678 0.82928843 0.79549513 0.8131434 0.78632938 0.89715584] mean value: 0.8137916451598168 key: train_mcc value: [0.84694026 0.87678779 0.87332809 0.86656872 0.88172633 0.87326071 0.8684111 0.86858635 0.88488253 0.8457717 ] mean value: 0.8686263570909856 key: test_accuracy value: [0.90510949 0.93430657 0.91240876 0.89781022 0.85294118 0.91176471 0.89705882 0.90441176 0.88970588 0.94852941] mean value: 0.9054046801202232 key: train_accuracy value: [0.92339038 0.93806031 0.93643032 0.93317033 0.94055375 0.93648208 0.93403909 0.93403909 0.94218241 0.92263844] mean value: 0.9340986198163471 key: test_fscore value: [0.90076336 0.93617021 0.91176471 0.90410959 0.85714286 0.91666667 0.9 0.90909091 0.89655172 0.94890511] mean value: 0.9081165132995447 key: train_fscore value: [0.92419355 0.93929712 0.93739968 0.93387097 0.94164668 0.93729904 0.93493976 0.93514812 0.94315452 0.92393915] mean value: 0.9350888590196927 key: test_precision value: [0.93650794 0.90410959 0.92537313 0.85714286 0.83333333 0.86842105 0.875 0.86666667 0.84415584 0.94202899] mean value: 0.8852739399314917 key: train_precision value: [0.91533546 0.92163009 0.92259084 0.92344498 0.92464678 0.92539683 0.92234548 0.91968504 0.92755906 0.90866142] mean value: 0.9211295973019243 key: test_recall value: [0.86764706 0.97058824 0.89855072 0.95652174 0.88235294 0.97058824 0.92647059 0.95588235 0.95588235 0.95588235] mean value: 0.9340366581415175 key: train_recall value: [0.93322476 0.95765472 0.95269168 0.94453507 0.95928339 0.9495114 0.94788274 0.95114007 0.95928339 0.93973941] mean value: 0.9494946623377313 key: test_roc_auc value: [0.90483802 0.93456948 0.91251066 0.89737852 0.85294118 0.91176471 0.89705882 0.90441176 0.88970588 0.94852941] mean value: 0.9053708439897699 key: train_roc_auc value: [0.92338236 0.93804433 0.93644356 0.93317959 0.94055375 0.93648208 0.93403909 0.93403909 0.94218241 0.92263844] mean value: 0.9340984691085121 key: test_jcc value: [0.81944444 0.88 0.83783784 0.825 0.75 0.84615385 0.81818182 0.83333333 0.8125 0.90277778] mean value: 0.8325229057729058 key: train_jcc value: [0.85907046 0.88554217 0.88217523 0.87594554 0.8897281 0.88199697 0.87782805 0.87819549 0.89242424 0.85863095] mean value: 0.8781537205877241 MCC on Blind test: 0.52 Accuracy on Blind test: 0.85 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01767254 0.02120185 0.02006078 0.01994205 0.01963353 0.01972198 0.01966166 0.02000618 0.0196619 0.01966596] mean value: 0.019722843170166017 key: score_time value: [0.01396465 0.01512766 0.01483417 0.01484108 0.01479626 0.01485062 0.01475906 0.01479578 0.01479769 0.0147922 ] mean value: 0.014755916595458985 key: test_mcc value: [0.59324085 0.6380904 0.59804827 0.63690876 0.69125122 0.5905386 0.54464795 0.61871843 0.56101167 0.67911938] mean value: 0.6151575523836643 key: train_mcc value: [0.64023262 0.62893946 0.63068241 0.61268574 0.63551639 0.63232786 0.66325449 0.64177893 0.66140974 0.63849068] mean value: 0.6385318324603495 key: test_accuracy value: [0.79562044 0.81751825 0.79562044 0.81751825 0.84558824 0.79411765 0.77205882 0.80882353 0.77941176 0.83823529] mean value: 0.8064512666380421 key: train_accuracy value: [0.8198859 0.81418093 0.81499593 0.80603097 0.81758958 0.81596091 0.83143322 0.82084691 0.83061889 0.81921824] mean value: 0.8190761476974374 key: test_fscore value: [0.78461538 0.80620155 0.78125 0.81203008 0.84444444 0.8028169 0.77697842 0.8030303 0.76923077 0.83076923] mean value: 0.8011367076340337 key: train_fscore value: [0.81659751 0.81031614 0.81035923 0.80133556 0.81456954 0.81260365 0.82850041 0.81937603 0.82866557 0.81803279] mean value: 0.8160356421443249 key: test_precision value: [0.82258065 0.85245902 0.84745763 0.84375 0.85074627 0.77027027 0.76056338 0.828125 0.80645161 0.87096774] mean value: 0.8253371562720764 key: train_precision value: [0.83248731 0.82823129 0.83047945 0.82051282 0.82828283 0.8277027 0.84317032 0.82615894 0.83833333 0.82343234] mean value: 0.8298791343084553 key: test_recall value: [0.75 0.76470588 0.72463768 0.7826087 0.83823529 0.83823529 0.79411765 0.77941176 0.73529412 0.79411765] mean value: 0.7801364023870417 key: train_recall value: [0.80130293 0.79315961 0.79119086 0.78303426 0.80130293 0.7980456 0.81433225 0.81270358 0.81921824 0.81270358] mean value: 0.8026993851990797 key: test_roc_auc value: [0.79528986 0.81713555 0.79614237 0.81777494 0.84558824 0.79411765 0.77205882 0.80882353 0.77941176 0.83823529] mean value: 0.806457800511509 key: train_roc_auc value: [0.81990106 0.81419808 0.81497654 0.80601224 0.81758958 0.81596091 0.83143322 0.82084691 0.83061889 0.81921824] mean value: 0.8190755668443231 key: test_jcc value: [0.64556962 0.67532468 0.64102564 0.6835443 0.73076923 0.67058824 0.63529412 0.67088608 0.625 0.71052632] mean value: 0.6688528215850197 key: train_jcc value: [0.69004208 0.68111888 0.68117978 0.66852368 0.68715084 0.68435754 0.70721358 0.69401947 0.70745429 0.69209431] mean value: 0.6893154442079789 MCC on Blind test: 0.39 Accuracy on Blind test: 0.75 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03860188 0.04693508 0.0453124 0.03687978 0.04116654 0.03766918 0.03457999 0.03815031 0.07156754 0.03939486] mean value: 0.043025755882263185 key: score_time value: [0.01524901 0.01752234 0.02431965 0.01302862 0.01294494 0.01732397 0.02216196 0.03092146 0.02712727 0.01669025] mean value: 0.019728946685791015 key: test_mcc value: [0.89863497 0.81027501 0.83947987 0.82066286 0.76503685 0.56666667 0.7768986 0.71492035 0.79446135 0.83258145] mean value: 0.7819617977261805 key: train_mcc value: [0.87474955 0.87939986 0.83059333 0.82722888 0.88445984 0.56230531 0.83579768 0.72591688 0.8681346 0.75795971] mean value: 0.8046545639530074 key: test_accuracy value: [0.94890511 0.90510949 0.91970803 0.90510949 0.88235294 0.75 0.88235294 0.83823529 0.89705882 0.91176471] mean value: 0.8840596822670674 key: train_accuracy value: [0.93724531 0.9396903 0.91524042 0.90872046 0.94218241 0.74348534 0.91286645 0.84609121 0.93403909 0.87052117] mean value: 0.8950082163269966 key: test_fscore value: [0.94736842 0.9037037 0.92086331 0.91275168 0.88405797 0.67307692 0.89189189 0.86075949 0.89855072 0.9047619 ] mean value: 0.8797786021014982 key: train_fscore value: [0.9380531 0.93954248 0.91585761 0.91515152 0.94260307 0.65798046 0.9191232 0.86624204 0.93441296 0.85532302] mean value: 0.8884289448756847 key: test_precision value: [0.96923077 0.91044776 0.91428571 0.85 0.87142857 0.97222222 0.825 0.75555556 0.88571429 0.98275862] mean value: 0.8936643500320803 key: train_precision value: [0.92686804 0.94262295 0.90850722 0.854314 0.93579454 0.98697068 0.85754584 0.76595745 0.92914654 0.96907216] mean value: 0.9076799436662107 key: test_recall value: [0.92647059 0.89705882 0.92753623 0.98550725 0.89705882 0.51470588 0.97058824 1. 0.91176471 0.83823529] mean value: 0.8868925831202046 key: train_recall value: [0.9495114 0.93648208 0.9233279 0.98531811 0.9495114 0.49348534 0.99022801 0.99674267 0.93973941 0.76547231] mean value: 0.8929818641699124 key: test_roc_auc value: [0.94874254 0.90505115 0.91965047 0.90451833 0.88235294 0.75 0.88235294 0.83823529 0.89705882 0.91176471] mean value: 0.8839727195225917 key: train_roc_auc value: [0.93723531 0.93969292 0.91524701 0.90878283 0.94218241 0.74348534 0.91286645 0.84609121 0.93403909 0.87052117] mean value: 0.8950143736948101 key: test_jcc value: [0.9 0.82432432 0.85333333 0.83950617 0.79220779 0.50724638 0.80487805 0.75555556 0.81578947 0.82608696] mean value: 0.7918928034058543 key: train_jcc value: [0.88333333 0.88597843 0.84477612 0.84357542 0.89143731 0.49029126 0.85034965 0.76404494 0.8768997 0.74721781] mean value: 0.8077903967346308 MCC on Blind test: 0.45 Accuracy on Blind test: 0.75 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.04438257 0.04041004 0.03820419 0.04464841 0.03713417 0.04714775 0.04193568 0.03920817 0.04001617 0.03890991] mean value: 0.04119970798492432 key: score_time value: [0.01795697 0.0130899 0.01262331 0.01278138 0.01264524 0.01248693 0.0125463 0.01261425 0.0127604 0.01984 ] mean value: 0.013934469223022461 key: test_mcc value: [0.9158731 0.89869927 0.78803902 0.84660737 0.64549722 0.82402205 0.80961181 0.69577462 0.82928843 0.84567499] mean value: 0.8099087887121932 key: train_mcc value: [0.88594196 0.88186074 0.82791629 0.89452489 0.70887969 0.90755723 0.88782117 0.71461937 0.90692901 0.84856482] mean value: 0.8464615175108487 key: test_accuracy value: [0.95620438 0.94890511 0.89051095 0.91970803 0.79411765 0.90441176 0.90441176 0.83088235 0.91176471 0.91911765] mean value: 0.8980034349506226 key: train_accuracy value: [0.94295029 0.9405053 0.91198044 0.94621027 0.83550489 0.9519544 0.94381107 0.84527687 0.95276873 0.9218241 ] mean value: 0.9192786356915121 key: test_fscore value: [0.95384615 0.94964029 0.88372093 0.92517007 0.82926829 0.91275168 0.90647482 0.8 0.91666667 0.92413793] mean value: 0.9001676828256018 key: train_fscore value: [0.94327391 0.94183267 0.90737564 0.94794953 0.85834502 0.95401403 0.94439968 0.82242991 0.9540412 0.92581144] mean value: 0.9199473022076147 key: test_precision value: [1. 0.92957746 0.95 0.87179487 0.70833333 0.83950617 0.88732394 0.9787234 0.86842105 0.87012987] mean value: 0.8903810113435183 key: train_precision value: [0.93870968 0.92199688 0.95660036 0.91755725 0.75369458 0.91479821 0.93460925 0.96491228 0.92901235 0.88088235] mean value: 0.9112773188146082 key: test_recall value: [0.91176471 0.97058824 0.82608696 0.98550725 1. 1. 0.92647059 0.67647059 0.97058824 0.98529412] mean value: 0.9252770673486787 key: train_recall value: [0.94788274 0.96254072 0.862969 0.98042414 0.99674267 0.99674267 0.95439739 0.71661238 0.98045603 0.97557003] mean value: 0.9374337773857411 key: test_roc_auc value: [0.95588235 0.94906223 0.89098465 0.91922421 0.79411765 0.90441176 0.90441176 0.83088235 0.91176471 0.91911765] mean value: 0.8979859335038363 key: train_roc_auc value: [0.94294626 0.94048732 0.91194053 0.94623813 0.83550489 0.9519544 0.94381107 0.84527687 0.95276873 0.9218241 ] mean value: 0.9192752310152983 key: test_jcc value: [0.91176471 0.90410959 0.79166667 0.86075949 0.70833333 0.83950617 0.82894737 0.66666667 0.84615385 0.85897436] mean value: 0.8216882201649766 key: train_jcc value: [0.89263804 0.89006024 0.83045526 0.90104948 0.75184275 0.91207154 0.89465649 0.6984127 0.91212121 0.8618705 ] mean value: 0.8545178201608485 MCC on Blind test: 0.33 Accuracy on Blind test: 0.63 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.36904597 0.34564185 0.41399765 0.32139015 0.33926725 0.32440686 0.38246441 0.38453221 0.37067652 0.35346103] mean value: 0.36048839092254636 key: score_time value: [0.01911259 0.01781178 0.01721931 0.01834321 0.01791716 0.01647162 0.01970482 0.02053666 0.01990604 0.01659083] mean value: 0.018361401557922364 key: test_mcc value: [0.95629932 0.90025835 0.94199209 0.92944673 0.8722811 0.91334626 0.8979331 0.91533482 0.88388348 0.95681396] mean value: 0.9167589213960957 key: train_mcc value: [0.94968259 0.96455457 0.95301603 0.95813054 0.97253071 0.94640596 0.95126624 0.9726856 0.9534379 0.97586128] mean value: 0.9597571420489015 key: test_accuracy value: [0.97810219 0.94890511 0.97080292 0.96350365 0.93382353 0.95588235 0.94852941 0.95588235 0.94117647 0.97794118] mean value: 0.9574549162730785 key: train_accuracy value: [0.97473513 0.98207009 0.97636512 0.97881011 0.98615635 0.97312704 0.97557003 0.98615635 0.97638436 0.98778502] mean value: 0.9797159593192262 key: test_fscore value: [0.97777778 0.95035461 0.97142857 0.96503497 0.93706294 0.95714286 0.94964029 0.95774648 0.94285714 0.97841727] mean value: 0.9587462894063403 key: train_fscore value: [0.97502015 0.9823435 0.97663175 0.97913323 0.98630137 0.97336562 0.97576737 0.98634538 0.97681855 0.98793242] mean value: 0.9799659321423493 key: test_precision value: [0.98507463 0.91780822 0.95774648 0.93243243 0.89333333 0.93055556 0.92957746 0.91891892 0.91666667 0.95774648] mean value: 0.9339860175485872 key: train_precision value: [0.96491228 0.96835443 0.96496815 0.96366509 0.97607656 0.9648 0.96794872 0.97305864 0.95918367 0.97615262] mean value: 0.9679120157573049 key: test_recall value: [0.97058824 0.98529412 0.98550725 1. 0.98529412 0.98529412 0.97058824 1. 0.97058824 1. ] mean value: 0.9853154305200341 key: train_recall value: [0.98534202 0.99674267 0.98858075 0.99510604 0.99674267 0.98208469 0.98371336 1. 0.99511401 1. ] mean value: 0.9923426199977682 key: test_roc_auc value: [0.97804774 0.9491688 0.9706948 0.96323529 0.93382353 0.95588235 0.94852941 0.95588235 0.94117647 0.97794118] mean value: 0.9574381926683717 key: train_roc_auc value: [0.97472647 0.98205812 0.97637507 0.97882338 0.98615635 0.97312704 0.97557003 0.98615635 0.97638436 0.98778502] mean value: 0.9797162191603211 key: test_jcc value: [0.95652174 0.90540541 0.94444444 0.93243243 0.88157895 0.91780822 0.90410959 0.91891892 0.89189189 0.95774648] mean value: 0.9210858066684366 key: train_jcc value: [0.95125786 0.96529968 0.95433071 0.9591195 0.97297297 0.94811321 0.95268139 0.97305864 0.9546875 0.97615262] mean value: 0.9607674080522771 MCC on Blind test: 0.65 Accuracy on Blind test: 0.91 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.22876954 0.08565283 0.19133615 0.09375477 0.20426679 0.21190095 0.19379854 0.18301129 0.19188833 0.18430281] mean value: 0.17686820030212402 key: score_time value: [0.04507017 0.02498293 0.02771354 0.02370334 0.02618623 0.02305579 0.02408266 0.0239079 0.02499819 0.0262382 ] mean value: 0.02699389457702637 key: test_mcc value: [1. 0.95713391 0.92944673 0.90246052 0.8753478 0.92898531 0.91533482 0.95681396 0.95681396 0.98540068] mean value: 0.9407737686262707 key: train_mcc value: [0.99837133 0.99837133 0.99674532 1. 0.99837266 0.99350642 0.99674796 1. 0.99837266 0.99674796] mean value: 0.9977235643412998 key: test_accuracy value: [1. 0.97810219 0.96350365 0.94890511 0.93382353 0.96323529 0.95588235 0.97794118 0.97794118 0.99264706] mean value: 0.9691981537140404 key: train_accuracy value: [0.999185 0.999185 0.99837001 1. 0.99918567 0.99674267 0.99837134 1. 0.99918567 0.99837134] mean value: 0.9988596693824349 key: test_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [1. 0.97841727 0.96503497 0.95172414 0.93793103 0.96453901 0.95774648 0.97841727 0.97841727 0.99270073] mean value: 0.9704928151902354 key: train_fscore value: [0.99918633 0.99918633 0.99837134 1. 0.99918633 0.99675325 0.99837398 1. 0.99918633 0.99837398] mean value: 0.998861787113732 key: test_precision value: [1. 0.95774648 0.93243243 0.90789474 0.88311688 0.93150685 0.91891892 0.95774648 0.95774648 0.98550725] mean value: 0.9432616503621938 key: train_precision value: [0.99837398 0.99837398 0.99674797 1. 0.99837398 0.99352751 0.99675325 1. 0.99837398 0.99675325] mean value: 0.9977277904036133 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.97826087 0.96323529 0.94852941 0.93382353 0.96323529 0.95588235 0.97794118 0.97794118 0.99264706] mean value: 0.9691496163682864 key: train_roc_auc value: [0.99918434 0.99918434 0.99837134 1. 0.99918567 0.99674267 0.99837134 1. 0.99918567 0.99837134] mean value: 0.9988596691659006 key: test_jcc value: [1. 0.95774648 0.93243243 0.90789474 0.88311688 0.93150685 0.91891892 0.95774648 0.95774648 0.98550725] mean value: 0.9432616503621938 key: train_jcc value: [0.99837398 0.99837398 0.99674797 1. 0.99837398 0.99352751 0.99675325 1. 0.99837398 0.99675325] mean value: 0.9977277904036133 MCC on Blind test: 0.57 Accuracy on Blind test: 0.89 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [1.34814286 1.62543893 1.64354491 1.46121097 1.30118752 1.36093926 1.41096067 1.34231901 1.25550342 1.57961941] mean value: 1.4328866958618165 key: score_time value: [0.08763576 0.07004118 0.08638644 0.06822419 0.05681825 0.07667327 0.08930254 0.07778025 0.07974267 0.04862738] mean value: 0.0741231918334961 key: test_mcc value: [0.90025835 0.91281179 0.92709446 0.88920184 0.8753478 0.90184995 0.8722811 0.88388348 0.8623165 0.89949371] mean value: 0.8924538968583584 key: train_mcc value: [0.97232223 0.97558234 0.97235367 0.97555143 0.97232431 0.97070464 0.97234494 0.96744724 0.97070464 0.96911836] mean value: 0.9718453807362983 key: test_accuracy value: [0.94890511 0.95620438 0.96350365 0.94160584 0.93382353 0.94852941 0.93382353 0.94117647 0.92647059 0.94852941] mean value: 0.9442571919278661 key: train_accuracy value: [0.98614507 0.98777506 0.98614507 0.98777506 0.98615635 0.98534202 0.98615635 0.98371336 0.98534202 0.98452769] mean value: 0.9859078045814983 key: test_fscore value: [0.95035461 0.95652174 0.96402878 0.94520548 0.93793103 0.95104895 0.93706294 0.94285714 0.93150685 0.95035461] mean value: 0.946687213018592 key: train_fscore value: [0.98621249 0.98783455 0.98621249 0.98777506 0.98619009 0.98538961 0.98621249 0.98376623 0.98538961 0.98461538] mean value: 0.9859598009108499 key: test_precision value: [0.91780822 0.94285714 0.95714286 0.8961039 0.88311688 0.90666667 0.89333333 0.91666667 0.87179487 0.91780822] mean value: 0.9103298756038481 key: train_precision value: [0.9822294 0.98384491 0.98064516 0.98697068 0.98379254 0.98220065 0.9822294 0.98058252 0.98220065 0.97906602] mean value: 0.9823761946884859 key: test_recall value: [0.98529412 0.97058824 0.97101449 1. 1. 1. 0.98529412 0.97058824 1. 0.98529412] mean value: 0.9868073316283035 key: train_recall value: [0.99022801 0.99185668 0.99184339 0.98858075 0.98859935 0.98859935 0.99022801 0.98697068 0.98859935 0.99022801] mean value: 0.9895733589810353 key: test_roc_auc value: [0.9491688 0.95630861 0.96344842 0.94117647 0.93382353 0.94852941 0.93382353 0.94117647 0.92647059 0.94852941] mean value: 0.9442455242966752 key: train_roc_auc value: [0.98614174 0.98777173 0.98614971 0.98777572 0.98615635 0.98534202 0.98615635 0.98371336 0.98534202 0.98452769] mean value: 0.9859076682731905 key: test_jcc value: [0.90540541 0.91666667 0.93055556 0.8961039 0.88311688 0.90666667 0.88157895 0.89189189 0.87179487 0.90540541] mean value: 0.8989186189975663 key: train_jcc value: [0.9728 0.97596154 0.9728 0.97584541 0.97275641 0.9712 0.9728 0.96805112 0.9712 0.96969697] mean value: 0.97231114472538 MCC on Blind test: 0.26 Accuracy on Blind test: 0.83 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [1.52704644 1.46939826 1.52141833 3.05834937 1.42187619 1.40447807 1.40688801 1.39297247 1.40064192 1.55902958] mean value: 1.616209864616394 key: score_time value: [0.01008701 0.00988674 0.01376224 0.00981331 0.01079798 0.00973082 0.00974679 0.00969934 0.00993872 0.01312947] mean value: 0.010659241676330566 key: test_mcc value: [1. 0.92951942 0.94318882 0.90246052 0.90184995 0.92898531 0.92898531 0.95681396 0.94280904 0.97100831] mean value: 0.9405620639400689 key: train_mcc value: [0.99188292 0.9886543 0.98865451 0.98543628 0.99674796 0.9902753 0.9902753 0.99188957 0.98705447 0.98705447] mean value: 0.9897925072080538 key: test_accuracy value: [1. 0.96350365 0.97080292 0.94890511 0.94852941 0.96323529 0.96323529 0.97794118 0.97058824 0.98529412] mean value: 0.9692035208243881 key: train_accuracy value: [0.99592502 0.99429503 0.99429503 0.99266504 0.99837134 0.99511401 0.99511401 0.99592834 0.99348534 0.99348534] mean value: 0.9948678485434934 key: test_fscore value: [1. 0.96453901 0.97183099 0.95172414 0.95104895 0.96453901 0.96453901 0.97841727 0.97142857 0.98550725] mean value: 0.9703574180164507 key: train_fscore value: [0.99594485 0.99433198 0.99432279 0.99271255 0.99837398 0.99513776 0.99513776 0.99594485 0.99352751 0.99352751] mean value: 0.9948961550938449 key: test_precision value: [1. 0.93150685 0.94520548 0.90789474 0.90666667 0.93150685 0.93150685 0.95774648 0.94444444 0.97142857] mean value: 0.9427906925652287 key: train_precision value: [0.99192246 0.98872786 0.98870968 0.98553055 0.99675325 0.99032258 0.99032258 0.99192246 0.98713826 0.98713826] mean value: 0.9898487928857995 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.96376812 0.97058824 0.94852941 0.94852941 0.96323529 0.96323529 0.97794118 0.97058824 0.98529412] mean value: 0.9691709292412618 key: train_roc_auc value: [0.9959217 0.99429038 0.99429967 0.99267101 0.99837134 0.99511401 0.99511401 0.99592834 0.99348534 0.99348534] mean value: 0.9948681127152733 key: test_jcc value: [1. 0.93150685 0.94520548 0.90789474 0.90666667 0.93150685 0.93150685 0.95774648 0.94444444 0.97142857] mean value: 0.9427906925652287 key: train_jcc value: [0.99192246 0.98872786 0.98870968 0.98553055 0.99675325 0.99032258 0.99032258 0.99192246 0.98713826 0.98713826] mean value: 0.9898487928857995 MCC on Blind test: 0.69 Accuracy on Blind test: 0.92 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.18439102 0.16080046 0.20425344 0.16646791 0.21126723 0.14091587 0.19247937 0.15423417 0.18020916 0.15448427] mean value: 0.1749502897262573 key: score_time value: [0.02884722 0.02191401 0.04009485 0.03732443 0.02903533 0.03681111 0.02155042 0.04020762 0.04215455 0.03590298] mean value: 0.03338425159454346 key: test_mcc value: [0.95713391 1. 1. 1. 1. 1. 0.97100831 1. 0.95681396 1. ] mean value: 0.9884956187492568 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97810219 1. 1. 1. 1. 1. 0.98529412 1. 0.97794118 1. ] mean value: 0.9941337483898669 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97841727 1. 1. 1. 1. 1. 0.98550725 1. 0.97841727 1. ] mean value: 0.9942341778750912 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95774648 1. 1. 1. 1. 1. 0.97142857 1. 0.95774648 1. ] mean value: 0.988692152917505 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97826087 1. 1. 1. 1. 1. 0.98529412 1. 0.97794118 1. ] mean value: 0.9941496163682865 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.95774648 1. 1. 1. 1. 1. 0.97142857 1. 0.95774648 1. ] mean value: 0.988692152917505 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.0 Accuracy on Blind test: 0.86 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.06424856 0.0752933 0.07174444 0.06764746 0.06863308 0.06759048 0.0916636 0.07553363 0.07163095 0.07230806] mean value: 0.07262935638427734 key: score_time value: [0.04801512 0.03882694 0.02715278 0.02783155 0.03385282 0.02948785 0.02757883 0.02810884 0.02770877 0.03306103] mean value: 0.03216245174407959 key: test_mcc value: [0.88654289 0.8251972 0.82480818 0.82788248 0.75008111 0.76603235 0.79549513 0.88273483 0.85442069 0.87000211] mean value: 0.8283196987346125 key: train_mcc value: [0.85838091 0.88143034 0.87309027 0.87137251 0.88488253 0.87307997 0.87638889 0.87175299 0.88330335 0.87485394] mean value: 0.8748535686741495 key: test_accuracy value: [0.94160584 0.91240876 0.91240876 0.91240876 0.875 0.88235294 0.89705882 0.94117647 0.92647059 0.93382353] mean value: 0.9134714469729498 key: train_accuracy value: [0.92909535 0.9405053 0.93643032 0.93561532 0.94218241 0.93648208 0.93811075 0.93566775 0.94136808 0.93729642] mean value: 0.9372753783625218 key: test_fscore value: [0.93846154 0.91304348 0.91304348 0.91666667 0.87591241 0.88571429 0.9 0.94202899 0.92857143 0.93617021] mean value: 0.9149612482967987 key: train_fscore value: [0.92989525 0.9414595 0.93709677 0.93613581 0.94315452 0.93699515 0.93870968 0.93664796 0.9424 0.9380531 ] mean value: 0.9380547742168248 key: test_precision value: [0.98387097 0.9 0.91304348 0.88 0.86956522 0.86111111 0.875 0.92857143 0.90277778 0.90410959] mean value: 0.9018049569895523 key: train_precision value: [0.92025518 0.92733017 0.92663477 0.92788462 0.92755906 0.92948718 0.92971246 0.92259084 0.92610063 0.92686804] mean value: 0.9264422946711286 key: test_recall value: [0.89705882 0.92647059 0.91304348 0.95652174 0.88235294 0.91176471 0.92647059 0.95588235 0.95588235 0.97058824] mean value: 0.9296035805626599 key: train_recall value: [0.93973941 0.95602606 0.94779772 0.94453507 0.95928339 0.94462541 0.94788274 0.95114007 0.95928339 0.9495114 ] /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:196: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_cd_sl.py:199: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) mean value: 0.9499824646237067 key: test_roc_auc value: [0.94128303 0.91251066 0.91240409 0.9120844 0.875 0.88235294 0.89705882 0.94117647 0.92647059 0.93382353] mean value: 0.913416453537937 key: train_roc_auc value: [0.92908667 0.94049264 0.93643957 0.93562259 0.94218241 0.93648208 0.93811075 0.93566775 0.94136808 0.93729642] mean value: 0.9372748962490236 key: test_jcc value: [0.88405797 0.84 0.84 0.84615385 0.77922078 0.79487179 0.81818182 0.89041096 0.86666667 0.88 ] mean value: 0.8439563835013507 key: train_jcc value: [0.8689759 0.88939394 0.88163885 0.87993921 0.89242424 0.88145897 0.88449848 0.88084465 0.89107413 0.88333333] mean value: 0.8833581697694837 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.65006852 0.65783858 0.64332271 0.60425997 0.60372329 0.5961287 0.64778042 0.60070848 0.55460596 0.62646484] mean value: 0.6184901475906373 key: score_time value: [0.02721882 0.02727461 0.02708626 0.02812886 0.02766991 0.03826523 0.02749848 0.05201983 0.02817082 0.02796197] mean value: 0.03112947940826416 key: test_mcc value: [0.88654289 0.84393916 0.83951407 0.82788248 0.72129053 0.76603235 0.79549513 0.84051051 0.8131434 0.87000211] mean value: 0.8204352632641159 key: train_mcc value: [0.85838091 0.87678779 0.88469777 0.87137251 0.88015146 0.87307997 0.87638889 0.87175299 0.87811224 0.87485394] mean value: 0.8745578471519682 key: test_accuracy value: [0.94160584 0.91970803 0.91970803 0.91240876 0.86029412 0.88235294 0.89705882 0.91911765 0.90441176 0.93382353] mean value: 0.9090489480463718 key: train_accuracy value: [0.92909535 0.93806031 0.94213529 0.93561532 0.93973941 0.93648208 0.93811075 0.93566775 0.93892508 0.93729642] mean value: 0.9371127773839958 key: test_fscore value: [0.93846154 0.92307692 0.91970803 0.91666667 0.86330935 0.88571429 0.9 0.92198582 0.90909091 0.93617021] mean value: 0.9114183733094183 key: train_fscore value: [0.92989525 0.93929712 0.94297189 0.93613581 0.94089457 0.93699515 0.93870968 0.93664796 0.93966211 0.9380531 ] mean value: 0.9379262630193704 key: test_precision value: [0.98387097 0.88 0.92647059 0.88 0.84507042 0.86111111 0.875 0.89041096 0.86666667 0.90410959] mean value: 0.8912710304235424 key: train_precision value: [0.92025518 0.92163009 0.92879747 0.92788462 0.92319749 0.92948718 0.92971246 0.92259084 0.92845787 0.92686804] mean value: 0.9258881244342322 key: test_recall value: [0.89705882 0.97058824 0.91304348 0.95652174 0.88235294 0.91176471 0.92647059 0.95588235 0.95588235 0.97058824] mean value: 0.9340153452685422 key: train_recall value: [0.93973941 0.95765472 0.95758564 0.94453507 0.95928339 0.94462541 0.94788274 0.95114007 0.95114007 0.9495114 ] mean value: 0.9503097916478471 key: test_roc_auc value: [0.94128303 0.92007673 0.91975703 0.9120844 0.86029412 0.88235294 0.89705882 0.91911765 0.90441176 0.93382353] mean value: 0.9090260017050299 key: train_roc_auc value: [0.92908667 0.93804433 0.94214787 0.93562259 0.93973941 0.93648208 0.93811075 0.93566775 0.93892508 0.93729642] mean value: 0.9371122954870318 key: test_jcc value: [0.88405797 0.85714286 0.85135135 0.84615385 0.75949367 0.79487179 0.81818182 0.85526316 0.83333333 0.88 ] mean value: 0.8379849800830307 key: train_jcc value: [0.8689759 0.88554217 0.89209726 0.87993921 0.88838612 0.88145897 0.88449848 0.88084465 0.8861912 0.88333333] mean value: 0.8831267294611943 MCC on Blind test: 0.61 Accuracy on Blind test: 0.89