/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_sl.py:549: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 817 PASS: my_features_df and aa_df successfully combined nrows: 817 ncols: 269 count of NULL values before imputation or_mychisq 244 log10_or_mychisq 244 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 168 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 175 ------------------------------------------------------------- Successfully split data according to scaling law: 1/np.sqrt(x_ncols) Train data size: (431, 175) Test data size: 0.07559289460184544 (36, 175) y_train numbers: Counter({1: 285, 0: 146}) y_train ratio: 0.512280701754386 y_test_numbers: Counter({1: 24, 0: 12}) y_test ratio: 0.5 ------------------------------------------------------------- Simple Random OverSampling Counter({1: 285, 0: 285}) (570, 175) Simple Random UnderSampling Counter({0: 146, 1: 146}) (292, 175) Simple Combined Over and UnderSampling Counter({0: 285, 1: 285}) (570, 175) SMOTE_NC OverSampling Counter({1: 285, 0: 285}) (570, 175) ##################################################################### Running ML analysis: scaling law split Gene name: katG Drug name: isoniazid Output directory: /home/tanu/git/Data/isoniazid/output/ml/tts_sl/ Sanity checks: ML source data size: (467, 175) Total input features: (431, 175) Target feature numbers: Counter({1: 285, 0: 146}) Target features ratio: 0.512280701754386 ##################################################################### ================================================================ Strucutral features (n): 36 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03372955 0.03613591 0.03644919 0.03534794 0.03659058 0.03599262 0.03608441 0.04660845 0.03674316 0.03684807] mean value: 0.03705298900604248 key: score_time value: [0.01260567 0.01232028 0.01445866 0.0145154 0.01467729 0.0162003 0.01573014 0.01584959 0.01475835 0.014889 ] mean value: 0.014600467681884766 key: test_mcc value: [0.58131836 0.74048587 0.78481149 0.89408867 0.53276418 0.8993825 0.80104099 0.63660014 0.95079854 0.79313677] mean value: 0.7614427513528801 key: train_mcc value: [0.86656096 0.8442975 0.838564 0.85620977 0.85050655 0.83787173 0.86186304 0.84413863 0.83211139 0.86186304] mean value: 0.8493986619628706 key: test_accuracy value: [0.81818182 0.88372093 0.90697674 0.95348837 0.79069767 0.95348837 0.90697674 0.8372093 0.97674419 0.90697674] mean value: 0.893446088794926 key: train_accuracy value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.94056848 0.93041237 0.92783505 0.93556701 0.93298969 0.92783505 0.93814433 0.93041237 0.92525773 0.93814433] mean value: 0.9327166413596526 key: test_fscore value: [0.87096774 0.92063492 0.93333333 0.96551724 0.84210526 0.96551724 0.93333333 0.8852459 0.98181818 0.93103448] mean value: 0.9229507641369733 key: train_fscore value: [0.95619048 0.9489603 0.94716981 0.95274102 0.9509434 0.94736842 0.95488722 0.94934334 0.94559099 0.95488722] mean value: 0.9508082198090645 key: test_precision value: [0.81818182 0.85294118 0.90322581 0.96551724 0.85714286 0.93333333 0.875 0.81818182 1. 0.9 ] mean value: 0.8923524051141338 key: train_precision value: [0.9330855 0.91941392 0.91605839 0.92307692 0.91970803 0.91636364 0.92363636 0.91666667 0.91304348 0.92363636] mean value: 0.9204689276271143 key: test_recall value: [0.93103448 1. 0.96551724 0.96551724 0.82758621 1. 1. 0.96428571 0.96428571 0.96428571] mean value: 0.9582512315270936 key: train_recall value: [0.98046875 0.98046875 0.98046875 0.984375 0.984375 0.98054475 0.98832685 0.9844358 0.98054475 0.98832685] mean value: 0.9832335238326848 key: test_roc_auc value: [0.76551724 0.82142857 0.87561576 0.94704433 0.77093596 0.93333333 0.86666667 0.78214286 0.98214286 0.88214286] mean value: 0.8626970443349754 key: train_roc_auc value: [0.92153208 0.90690104 0.90311316 0.91264205 0.90885417 0.90248611 0.91401075 0.90443164 0.89866932 0.91401075] mean value: 0.908665107972322 key: test_jcc value: [0.77142857 0.85294118 0.875 0.93333333 0.72727273 0.93333333 0.875 0.79411765 0.96428571 0.87096774] mean value: 0.8597680245118575 key: train_jcc value: [0.91605839 0.9028777 0.89964158 0.90974729 0.90647482 0.9 0.91366906 0.90357143 0.89679715 0.91366906] mean value: 0.9062506492718643 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.95363975 0.8347497 0.98288345 0.84292221 0.82918787 0.90894675 0.84772086 0.95350075 0.82799745 0.85421014] mean value: 0.8835758924484253 key: score_time value: [0.01479244 0.01366854 0.0137701 0.01498795 0.01960564 0.01608729 0.01490307 0.01511002 0.01493359 0.01592612] mean value: 0.015378475189208984 key: test_mcc value: [0.85146932 0.84515772 0.81883947 0.84515772 0.9025825 0.94928891 0.85004744 0.7412616 0.95079854 0.6479516 ] mean value: 0.8402554820628604 key: train_mcc value: [0.98846016 0.97701629 1. 0.98276159 0.98854135 0.9769295 0.98847536 1. 0.9769295 1. ] mean value: 0.9879113755069235 key: test_accuracy value: [0.93181818 0.93023256 0.90697674 0.93023256 0.95348837 0.97674419 0.93023256 0.88372093 0.97674419 0.8372093 ] mean value: 0.9257399577167019 key: train_accuracy value: [0.99483204 0.98969072 1. 0.99226804 0.99484536 0.98969072 0.99484536 1. 0.98969072 1. ] mean value: 0.994586296917872 key: test_fscore value: [0.95081967 0.94736842 0.92592593 0.94736842 0.96428571 0.98245614 0.94915254 0.91525424 0.98181818 0.87272727] mean value: 0.94371765290054 key: train_fscore value: [0.99609375 0.9922179 1. 0.99415205 0.99610895 0.99224806 0.99610895 1. 0.99224806 1. ] mean value: 0.9959177718480003 key: test_precision value: [0.90625 0.96428571 1. 0.96428571 1. 0.96551724 0.90322581 0.87096774 1. 0.88888889] mean value: 0.9463421107226725 key: train_precision value: [0.99609375 0.98837209 1. 0.9922179 0.99224806 0.98841699 0.99610895 1. 0.98841699 1. ] mean value: 0.9941874730121764 key: test_recall value: [1. 0.93103448 0.86206897 0.93103448 0.93103448 1. 1. 0.96428571 0.96428571 0.85714286] mean value: 0.9440886699507389 key: train_recall value: [0.99609375 0.99609375 1. 0.99609375 1. 0.99610895 0.99610895 1. 0.99610895 1. ] mean value: 0.9976608098249027 key: test_roc_auc value: [0.9 0.92980296 0.93103448 0.92980296 0.96551724 0.96666667 0.9 0.84880952 0.98214286 0.82857143] mean value: 0.9182348111658457 key: train_roc_auc value: [0.99423008 0.98668324 1. 0.99047112 0.99242424 0.98660409 0.99423768 1. 0.98660409 1. ] mean value: 0.9931254546464323 key: test_jcc value: [0.90625 0.9 0.86206897 0.9 0.93103448 0.96551724 0.90322581 0.84375 0.96428571 0.77419355] mean value: 0.8950325758779596 key: train_jcc value: [0.9922179 0.98455598 1. 0.98837209 0.99224806 0.98461538 0.99224806 1. 0.98461538 1. ] mean value: 0.9918872869673703 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01457953 0.01114249 0.01063085 0.01069617 0.01100039 0.0102458 0.01035047 0.0098381 0.01021338 0.01029086] mean value: 0.010898804664611817 key: score_time value: [0.01245618 0.01024842 0.01079679 0.00953054 0.00977015 0.00972557 0.00895238 0.00899076 0.00913095 0.00907373] mean value: 0.009867548942565918 key: test_mcc value: [0.45305024 0.56055699 0.51517946 0.57635468 0.35173219 0.7412616 0.40616479 0.40241617 0.74102654 0.5892454 ] mean value: 0.5336988065510012 key: train_mcc value: [0.62582364 0.58240004 0.54248922 0.58757265 0.59314497 0.58075708 0.63502039 0.58214114 0.60024604 0.61735991] mean value: 0.5946955073947634 key: test_accuracy value: [0.75 0.81395349 0.79069767 0.81395349 0.69767442 0.88372093 0.72093023 0.74418605 0.88372093 0.79069767] mean value: 0.788953488372093 key: train_accuracy value: [0.83462532 0.81443299 0.79639175 0.81701031 0.81701031 0.81443299 0.83762887 0.81443299 0.82216495 0.81701031] mean value: 0.8185140786914942 key: test_fscore value: [0.80701754 0.875 0.84745763 0.86206897 0.76363636 0.91525424 0.77777778 0.82539683 0.9122807 0.82352941] mean value: 0.8409419454113729 key: train_fscore value: [0.87692308 0.86100386 0.84719536 0.86319846 0.86105675 0.86153846 0.87814313 0.86100386 0.86653772 0.85420945] mean value: 0.8630810124993853 key: test_precision value: [0.82142857 0.8 0.83333333 0.86206897 0.80769231 0.87096774 0.80769231 0.74285714 0.89655172 0.91304348] mean value: 0.8355635572855189 key: train_precision value: [0.86363636 0.85114504 0.83908046 0.85171103 0.8627451 0.85171103 0.87307692 0.85440613 0.86153846 0.90434783] mean value: 0.8613398353816113 key: test_recall value: [0.79310345 0.96551724 0.86206897 0.86206897 0.72413793 0.96428571 0.75 0.92857143 0.92857143 0.75 ] mean value: 0.8528325123152709 key: train_recall value: [0.890625 0.87109375 0.85546875 0.875 0.859375 0.87159533 0.88326848 0.86770428 0.87159533 0.80933852] mean value: 0.8655064445525292 key: test_roc_auc value: [0.72988506 0.73275862 0.75246305 0.78817734 0.68349754 0.84880952 0.70833333 0.66428571 0.86428571 0.80833333] mean value: 0.7580829228243021 key: train_roc_auc value: [0.80790792 0.7878196 0.76864347 0.78977273 0.79711174 0.7869427 0.81568004 0.78881397 0.79839309 0.8206998 ] mean value: 0.796178505644296 key: test_jcc value: [0.67647059 0.77777778 0.73529412 0.75757576 0.61764706 0.84375 0.63636364 0.7027027 0.83870968 0.7 ] mean value: 0.7286291316545112 key: train_jcc value: [0.78082192 0.7559322 0.73489933 0.75932203 0.75601375 0.75675676 0.78275862 0.7559322 0.76450512 0.74551971] mean value: 0.7592461643211699 MCC on Blind test: 0.64 Accuracy on Blind test: 0.83 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01165366 0.01043344 0.01005793 0.01009703 0.00994444 0.00998759 0.01104665 0.011199 0.01103497 0.01107717] mean value: 0.010653185844421386 key: score_time value: [0.00956798 0.00930119 0.00931573 0.00896025 0.00888276 0.009022 0.00979853 0.00989366 0.00984979 0.00986552] mean value: 0.009445738792419434 key: test_mcc value: [0.49425287 0.55732713 0.67416594 0.74102654 0.36453202 0.79313677 0.63660014 0.46554006 0.63247577 0.52368994] mean value: 0.5882747173578298 key: train_mcc value: [0.65753645 0.65942575 0.67797225 0.64082272 0.65903595 0.64878733 0.64546126 0.66416419 0.6450794 0.6828107 ] mean value: 0.6581095999527902 key: test_accuracy value: [0.77272727 0.81395349 0.86046512 0.88372093 0.72093023 0.90697674 0.8372093 0.76744186 0.8372093 0.79069767] mean value: 0.8191331923890064 key: train_accuracy value: [0.8501292 0.85051546 0.85824742 0.84278351 0.85051546 0.84536082 0.84536082 0.85309278 0.84536082 0.86082474] mean value: 0.8502191054636511 key: test_fscore value: [0.82758621 0.87096774 0.9 0.9122807 0.79310345 0.93103448 0.8852459 0.83333333 0.88135593 0.84745763] mean value: 0.8682365375915616 key: train_fscore value: [0.89056604 0.89056604 0.89563567 0.88555347 0.89097744 0.88549618 0.88764045 0.89265537 0.8880597 0.89772727] mean value: 0.8904877637720091 key: test_precision value: [0.82758621 0.81818182 0.87096774 0.92857143 0.79310345 0.9 0.81818182 0.78125 0.83870968 0.80645161] mean value: 0.8383003752365543 key: train_precision value: [0.86131387 0.86131387 0.87084871 0.85198556 0.85869565 0.86891386 0.85559567 0.8649635 0.85304659 0.87453875] mean value: 0.8621216027021169 key: test_recall value: [0.82758621 0.93103448 0.93103448 0.89655172 0.79310345 0.96428571 0.96428571 0.89285714 0.92857143 0.89285714] mean value: 0.9022167487684729 key: train_recall value: [0.921875 0.921875 0.921875 0.921875 0.92578125 0.90272374 0.92217899 0.92217899 0.92607004 0.92217899] mean value: 0.9208611989299611 key: test_roc_auc value: [0.74712644 0.75123153 0.8226601 0.87684729 0.68226601 0.88214286 0.78214286 0.71309524 0.79761905 0.74642857] mean value: 0.7801559934318555 key: train_roc_auc value: [0.81589933 0.81699811 0.82836174 0.80563447 0.81516335 0.81777408 0.80841774 0.81986812 0.80654647 0.8313185 ] mean value: 0.8165981914150152 key: test_jcc value: [0.70588235 0.77142857 0.81818182 0.83870968 0.65714286 0.87096774 0.79411765 0.71428571 0.78787879 0.73529412] mean value: 0.7693889285919646 key: train_jcc value: [0.80272109 0.80272109 0.81099656 0.79461279 0.80338983 0.79452055 0.7979798 0.80612245 0.79865772 0.81443299] mean value: 0.8026154868282023 MCC on Blind test: 0.52 Accuracy on Blind test: 0.78 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00989199 0.0111444 0.01048923 0.0103128 0.010391 0.01045656 0.01054001 0.01035523 0.01011539 0.01033711] mean value: 0.01040337085723877 key: score_time value: [0.05038977 0.01279974 0.01173329 0.01209927 0.01200771 0.01221824 0.01208305 0.01429176 0.01597929 0.01330972] mean value: 0.016691184043884276 key: test_mcc value: [0.21152604 0.43985131 0.57635468 0.40711743 0.27510532 0.58298976 0.52368994 0.24187277 0.63247577 0.34309924] mean value: 0.42340822539565004 key: train_mcc value: [0.69961993 0.65931339 0.67101659 0.6346185 0.66497487 0.63246023 0.68748675 0.64455576 0.62607402 0.65075438] mean value: 0.6570874415706287 key: test_accuracy value: [0.68181818 0.76744186 0.81395349 0.74418605 0.69767442 0.81395349 0.79069767 0.6744186 0.8372093 0.72093023] mean value: 0.7542283298097252 key: train_accuracy value: [0.86821705 0.85051546 0.8556701 0.84020619 0.85309278 0.84020619 0.86340206 0.84536082 0.83762887 0.84793814] mean value: 0.8502237672820266 key: test_fscore value: [0.78787879 0.83870968 0.86206897 0.81355932 0.78688525 0.87096774 0.84745763 0.76666667 0.88135593 0.80645161] mean value: 0.8262001579578332 key: train_fscore value: [0.90538033 0.89377289 0.89667897 0.88686131 0.89502762 0.88686131 0.90130354 0.88929889 0.88482633 0.89134438] mean value: 0.8931355586193345 key: test_precision value: [0.7027027 0.78787879 0.86206897 0.8 0.75 0.79411765 0.80645161 0.71875 0.83870968 0.73529412] mean value: 0.7795973511127194 key: train_precision value: [0.86219081 0.84137931 0.84965035 0.83219178 0.8466899 0.83505155 0.86428571 0.84561404 0.83448276 0.84615385] mean value: 0.8457690049548049 key: test_recall value: [0.89655172 0.89655172 0.86206897 0.82758621 0.82758621 0.96428571 0.89285714 0.82142857 0.92857143 0.89285714] mean value: 0.8810344827586207 key: train_recall value: [0.953125 0.953125 0.94921875 0.94921875 0.94921875 0.94552529 0.94163424 0.93774319 0.94163424 0.94163424] mean value: 0.946207745622568 key: test_roc_auc value: [0.5816092 0.69827586 0.78817734 0.69950739 0.62807882 0.74880952 0.74642857 0.61071429 0.79761905 0.64642857] mean value: 0.6945648604269294 key: train_roc_auc value: [0.82770754 0.80232008 0.81173059 0.78900331 0.80794271 0.78955654 0.82577895 0.80093266 0.78761101 0.80287819] mean value: 0.8045461582612031 key: test_jcc value: [0.65 0.72222222 0.75757576 0.68571429 0.64864865 0.77142857 0.73529412 0.62162162 0.78787879 0.67567568] mean value: 0.7056059688412629 key: train_jcc value: [0.82711864 0.80794702 0.81270903 0.79672131 0.81 0.79672131 0.82033898 0.80066445 0.79344262 0.80398671] mean value: 0.8069650085778866 MCC on Blind test: 0.33 Accuracy on Blind test: 0.69 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02155566 0.02098107 0.01806235 0.01775026 0.01926017 0.02073121 0.01955342 0.02025127 0.01990175 0.01973605] mean value: 0.01977832317352295 key: score_time value: [0.01182008 0.01138163 0.01180887 0.01104093 0.01332617 0.01129508 0.01093888 0.01217222 0.01113939 0.01160002] mean value: 0.011652326583862305 key: test_mcc value: [0.52525148 0.57957513 0.7300872 0.84383267 0.49188359 0.8993825 0.60246408 0.4630445 0.79313677 0.58298976] mean value: 0.651164767326194 key: train_mcc value: [0.74953754 0.69029892 0.71608944 0.72111098 0.71127097 0.71860826 0.70727056 0.72056414 0.67681039 0.75638254] mean value: 0.7167943743978092 key: test_accuracy value: [0.79545455 0.81395349 0.88372093 0.93023256 0.76744186 0.95348837 0.81395349 0.76744186 0.90697674 0.81395349] mean value: 0.844661733615222 key: train_accuracy value: [0.88888889 0.86340206 0.87371134 0.87628866 0.87113402 0.87628866 0.87113402 0.87628866 0.85824742 0.89175258] mean value: 0.8747136311569301 key: test_fscore value: [0.85714286 0.87878788 0.91803279 0.95081967 0.82142857 0.96551724 0.875 0.83870968 0.93103448 0.87096774] mean value: 0.890744090986847 key: train_fscore value: [0.92051756 0.90275229 0.91042048 0.91176471 0.90909091 0.91143911 0.90842491 0.91240876 0.89981785 0.92279412] mean value: 0.9109430694169829 key: test_precision value: [0.79411765 0.78378378 0.875 0.90625 0.85185185 0.93333333 0.77777778 0.76470588 0.9 0.79411765] mean value: 0.8380937923217335 key: train_precision value: [0.87368421 0.85121107 0.8556701 0.86111111 0.85034014 0.86666667 0.85813149 0.85910653 0.84589041 0.87456446] mean value: 0.8596376188103771 key: test_recall value: [0.93103448 1. 0.96551724 1. 0.79310345 1. 1. 0.92857143 0.96428571 0.96428571] mean value: 0.954679802955665 key: train_recall value: [0.97265625 0.9609375 0.97265625 0.96875 0.9765625 0.96108949 0.96498054 0.97276265 0.96108949 0.9766537 ] mean value: 0.9688138375486381 key: test_roc_auc value: [0.73218391 0.71428571 0.83990148 0.89285714 0.75369458 0.93333333 0.73333333 0.69761905 0.88214286 0.74880952] mean value: 0.792816091954023 key: train_roc_auc value: [0.84892354 0.81758996 0.82723722 0.83285985 0.82161458 0.83550658 0.82600172 0.82989277 0.80878902 0.85092227] mean value: 0.829933751991992 key: test_jcc value: [0.75 0.78378378 0.84848485 0.90625 0.6969697 0.93333333 0.77777778 0.72222222 0.87096774 0.77142857] mean value: 0.8061217975935718 key: train_jcc value: [0.85273973 0.82274247 0.83557047 0.83783784 0.83333333 0.83728814 0.83221477 0.83892617 0.81788079 0.85665529] mean value: 0.8365189001908526 MCC on Blind test: 0.75 Accuracy on Blind test: 0.89 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.38460779 1.60607076 1.54561806 1.4221251 1.57054877 1.54352975 1.41555548 1.55845952 1.54474926 1.41102624] mean value: 1.5002290725708007 key: score_time value: [0.01263404 0.01581979 0.01433444 0.01546478 0.01272631 0.01477623 0.0147686 0.01492023 0.01476097 0.01733851] mean value: 0.014754390716552735 key: test_mcc value: [0.80277297 0.78481149 0.68226601 0.78817734 0.65625201 0.94928891 0.80104099 0.79313677 0.80536675 0.63689536] mean value: 0.7700008592726074 key: train_mcc value: [1. 0.98854135 0.99426489 0.99426489 0.99426489 0.99424345 0.99424345 0.98847536 0.98849821 1. ] mean value: 0.99367964807764 key: test_accuracy value: [0.90909091 0.90697674 0.86046512 0.90697674 0.8372093 0.97674419 0.90697674 0.90697674 0.90697674 0.8372093 ] mean value: 0.8955602536997885 key: train_accuracy value: [1. 0.99484536 0.99742268 0.99742268 0.99742268 0.99742268 0.99742268 0.99484536 0.99484536 1. ] mean value: 0.9971649484536083 key: test_fscore value: [0.93548387 0.93333333 0.89655172 0.93103448 0.87272727 0.98245614 0.93333333 0.93103448 0.92592593 0.87719298] mean value: 0.9219073548749797 key: train_fscore value: [1. 0.99610895 0.99805068 0.99805068 0.99805068 0.99805825 0.99805825 0.99610895 0.99612403 1. ] mean value: 0.9978610481478432 key: test_precision value: [0.87878788 0.90322581 0.89655172 0.93103448 0.92307692 0.96551724 0.875 0.9 0.96153846 0.86206897] mean value: 0.9096801483647979 key: train_precision value: [1. 0.99224806 0.99610895 0.99610895 0.99610895 0.99612403 0.99612403 0.99610895 0.99227799 1. ] mean value: 0.9961209913974369 key: test_recall value: [1. 0.96551724 0.89655172 0.93103448 0.82758621 1. 1. 0.96428571 0.89285714 0.89285714] mean value: 0.9370689655172414 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 0.99610895 1. 1. ] mean value: 0.9996108949416342 key: test_roc_auc value: [0.86666667 0.87561576 0.841133 0.89408867 0.84236453 0.96666667 0.86666667 0.88214286 0.91309524 0.81309524] mean value: 0.8761535303776683 key: train_roc_auc value: [1. 0.99242424 0.99621212 0.99621212 0.99621212 0.99618321 0.99618321 0.99423768 0.99236641 1. ] mean value: 0.9960031111303128 key: test_jcc value: [0.87878788 0.875 0.8125 0.87096774 0.77419355 0.96551724 0.875 0.87096774 0.86206897 0.78125 ] mean value: 0.8566253117942495 key: train_jcc value: [1. 0.99224806 0.99610895 0.99610895 0.99610895 0.99612403 0.99612403 0.99224806 0.99227799 1. ] mean value: 0.9957349026573531 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02398276 0.02131772 0.01714778 0.02014828 0.02126026 0.01924372 0.01785946 0.02201104 0.02174163 0.02186418] mean value: 0.020657682418823244 key: score_time value: [0.01223946 0.00949287 0.00887823 0.00909805 0.00917006 0.00899124 0.00920415 0.00913548 0.00919437 0.00912642] mean value: 0.009453034400939942 key: test_mcc value: [0.84691397 0.94928891 1. 0.84515772 0.94928891 0.80536675 0.84515772 0.86258195 0.95079854 0.8993825 ] mean value: 0.8953936967542797 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93181818 0.97674419 1. 0.93023256 0.97674419 0.90697674 0.93023256 0.93023256 0.97674419 0.95348837] mean value: 0.9513213530655391 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94915254 0.98245614 1. 0.94736842 0.98245614 0.92592593 0.94736842 0.94339623 0.98181818 0.96551724] mean value: 0.9625459240718411 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93333333 1. 1. 0.96428571 1. 0.96153846 0.93103448 1. 1. 0.93333333] mean value: 0.9723525325249464 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.96551724 1. 0.93103448 0.96551724 0.89285714 0.96428571 0.89285714 0.96428571 1. ] mean value: 0.9541871921182267 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91609195 0.98275862 1. 0.92980296 0.98275862 0.91309524 0.91547619 0.94642857 0.98214286 0.93333333] mean value: 0.9501888341543514 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90322581 0.96551724 1. 0.9 0.96551724 0.86206897 0.9 0.89285714 0.96428571 0.93333333] mean value: 0.9286805445203665 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.1146183 0.11344624 0.11548758 0.11227894 0.11485791 0.11730623 0.11329865 0.11277914 0.11396575 0.11266923] mean value: 0.11407079696655273 key: score_time value: [0.01814842 0.01810622 0.01765132 0.01797366 0.01852298 0.01800585 0.01813388 0.01781464 0.01797915 0.01783133] mean value: 0.018016743659973144 key: test_mcc value: [0.58621892 0.67480294 0.84383267 0.84515772 0.62324149 0.8993825 0.8993825 0.5773737 0.94928891 0.68920734] mean value: 0.7587888702546957 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.81818182 0.86046512 0.93023256 0.93023256 0.8372093 0.95348837 0.95348837 0.81395349 0.97674419 0.86046512] mean value: 0.893446088794926 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.875 0.90322581 0.95081967 0.94736842 0.88135593 0.96551724 0.96551724 0.86666667 0.98245614 0.9 ] mean value: 0.9237927121614946 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.84848485 0.90625 0.96428571 0.86666667 0.93333333 0.93333333 0.8125 0.96551724 0.84375 ] mean value: 0.8874121137483206 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.96551724 1. 0.93103448 0.89655172 1. 1. 0.92857143 1. 0.96428571] mean value: 0.9651477832512315 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.74942529 0.80418719 0.89285714 0.92980296 0.80541872 0.93333333 0.93333333 0.76428571 0.96666667 0.81547619] mean value: 0.8594786535303777 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.77777778 0.82352941 0.90625 0.9 0.78787879 0.93333333 0.93333333 0.76470588 0.96551724 0.81818182] mean value: 0.8610507586002007 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01006985 0.01007056 0.01002431 0.01016045 0.01020622 0.00985289 0.01042724 0.00990152 0.01044726 0.01008797] mean value: 0.010124826431274414 key: score_time value: [0.00898814 0.00896454 0.00909829 0.00897264 0.00896001 0.00890684 0.00891232 0.00885296 0.00876856 0.00931239] mean value: 0.00897367000579834 key: test_mcc value: [0.41084026 0.19099336 0.40711743 0.55732713 0.23158372 0.60576577 0.53276418 0.13003912 0.38571429 0.36815383] mean value: 0.3820299081033254 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.65116279 0.74418605 0.81395349 0.6744186 0.81395349 0.79069767 0.62790698 0.72093023 0.72093023] mean value: 0.7308139534883721 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82539683 0.74576271 0.81355932 0.87096774 0.76666667 0.85185185 0.84210526 0.73333333 0.78571429 0.79310345] mean value: 0.8028461450230509 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76470588 0.73333333 0.8 0.81818182 0.74193548 0.88461538 0.82758621 0.6875 0.78571429 0.76666667] mean value: 0.781023906163195 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.89655172 0.75862069 0.82758621 0.93103448 0.79310345 0.82142857 0.85714286 0.78571429 0.78571429 0.82142857] mean value: 0.8278325123152709 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.6816092 0.59359606 0.69950739 0.75123153 0.61083744 0.81071429 0.76190476 0.55952381 0.69285714 0.67738095] mean value: 0.6839162561576355 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7027027 0.59459459 0.68571429 0.77142857 0.62162162 0.74193548 0.72727273 0.57894737 0.64705882 0.65714286] mean value: 0.6728419036298793 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.58 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.62458539 1.63666844 1.60537386 1.65038395 1.66536379 1.63138604 1.636024 1.60371423 1.62825584 1.61648202] mean value: 1.6298237562179565 key: score_time value: [0.09931064 0.09000063 0.09789276 0.0941186 0.09213281 0.10003638 0.09792519 0.09421635 0.09055543 0.09021354] mean value: 0.09464023113250733 key: test_mcc value: [0.79532948 0.94742759 0.84383267 0.94742759 0.83936556 0.94928891 0.8993825 0.74102654 1. 0.85004744] mean value: 0.8813128291534343 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90909091 0.97674419 0.93023256 0.97674419 0.93023256 0.97674419 0.95348837 0.88372093 1. 0.93023256] mean value: 0.946723044397463 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.98305085 0.95081967 0.98305085 0.94915254 0.98245614 0.96551724 0.9122807 1. 0.94915254] mean value: 0.9608813868610071 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90322581 0.96666667 0.90625 0.96666667 0.93333333 0.96551724 0.93333333 0.89655172 1. 0.90322581] mean value: 0.9374770578420467 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 1. 1. 1. 0.96551724 1. 1. 0.92857143 1. 1. ] mean value: 0.9859605911330049 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88275862 0.96428571 0.89285714 0.96428571 0.91133005 0.96666667 0.93333333 0.86428571 1. 0.9 ] mean value: 0.9279802955665025 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.96666667 0.90625 0.96666667 0.90322581 0.96551724 0.93333333 0.83870968 1. 0.90322581] mean value: 0.9258595198368558 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.85724068 0.91922164 1.05045414 0.93645144 0.97065115 0.97242212 0.9203217 0.949862 0.94439721 0.943084 ] mean value: 1.046410608291626 key: score_time value: [0.23833275 0.25026321 0.27845311 0.22460794 0.25519276 0.24637866 0.28193164 0.22761822 0.26391268 0.16427517] mean value: 0.24309661388397216 key: test_mcc value: [0.74381228 0.79227876 0.84383267 0.94742759 0.83936556 0.94928891 0.8993825 0.7412616 1. 0.85004744] mean value: 0.8606697309329067 key: train_mcc value: [0.94850869 0.93708276 0.9487737 0.9544252 0.9374974 0.94824314 0.94824314 0.92545561 0.94290506 0.93724717] mean value: 0.9428381861445998 key: test_accuracy value: [0.88636364 0.90697674 0.93023256 0.97674419 0.93023256 0.97674419 0.95348837 0.88372093 1. 0.93023256] mean value: 0.9374735729386892 key: train_accuracy value: [0.97674419 0.97164948 0.97680412 0.97938144 0.97164948 0.97680412 0.97680412 0.96649485 0.9742268 0.97164948] mean value: 0.9742208103572285 key: test_fscore value: [0.91803279 0.93548387 0.95081967 0.98305085 0.94915254 0.98245614 0.96551724 0.91525424 1. 0.94915254] mean value: 0.9548919881205848 key: train_fscore value: [0.98272553 0.97888676 0.98272553 0.98461538 0.9789675 0.98272553 0.98272553 0.9752381 0.98091603 0.97904762] mean value: 0.9808573492217716 key: test_precision value: [0.875 0.87878788 0.90625 0.96666667 0.93333333 0.96551724 0.93333333 0.87096774 1. 0.90322581] mean value: 0.9233082001887619 key: train_precision value: [0.96603774 0.96226415 0.96603774 0.96969697 0.9588015 0.96969697 0.96969697 0.95522388 0.96254682 0.95895522] mean value: 0.9638957950816772 key: test_recall value: [0.96551724 1. 1. 1. 0.96551724 1. 1. 0.96428571 1. 1. ] mean value: 0.9895320197044335 key: train_recall value: [1. 0.99609375 1. 1. 1. 0.99610895 0.99610895 0.99610895 1. 1. ] mean value: 0.9984420598249028 key: test_roc_auc value: [0.84942529 0.85714286 0.89285714 0.96428571 0.91133005 0.96666667 0.93333333 0.84880952 1. 0.9 ] mean value: 0.9123850574712644 key: train_roc_auc value: [0.96564885 0.96016809 0.96590909 0.96969697 0.95833333 0.96752012 0.96752012 0.95225295 0.96183206 0.95801527] mean value: 0.9626896859383592 key: test_jcc value: [0.84848485 0.87878788 0.90625 0.96666667 0.90322581 0.96551724 0.93333333 0.84375 1. 0.90322581] mean value: 0.9149241581555263 key: train_jcc value: [0.96603774 0.95864662 0.96603774 0.96969697 0.9588015 0.96603774 0.96603774 0.95167286 0.96254682 0.95895522] mean value: 0.962447093057542 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02334285 0.00983024 0.00985956 0.00978518 0.00993013 0.00973535 0.00983548 0.00996304 0.0098989 0.00982642] mean value: 0.011200714111328124 key: score_time value: [0.01340628 0.00875068 0.00893688 0.00888753 0.00888848 0.00883889 0.00882483 0.00888896 0.00888062 0.00899267] mean value: 0.009329581260681152 key: test_mcc value: [0.49425287 0.55732713 0.67416594 0.74102654 0.36453202 0.79313677 0.63660014 0.46554006 0.63247577 0.52368994] mean value: 0.5882747173578298 key: train_mcc value: [0.65753645 0.65942575 0.67797225 0.64082272 0.65903595 0.64878733 0.64546126 0.66416419 0.6450794 0.6828107 ] mean value: 0.6581095999527902 key: test_accuracy value: [0.77272727 0.81395349 0.86046512 0.88372093 0.72093023 0.90697674 0.8372093 0.76744186 0.8372093 0.79069767] mean value: 0.8191331923890064 key: train_accuracy value: [0.8501292 0.85051546 0.85824742 0.84278351 0.85051546 0.84536082 0.84536082 0.85309278 0.84536082 0.86082474] mean value: 0.8502191054636511 key: test_fscore value: [0.82758621 0.87096774 0.9 0.9122807 0.79310345 0.93103448 0.8852459 0.83333333 0.88135593 0.84745763] mean value: 0.8682365375915616 key: train_fscore value: [0.89056604 0.89056604 0.89563567 0.88555347 0.89097744 0.88549618 0.88764045 0.89265537 0.8880597 0.89772727] mean value: 0.8904877637720091 key: test_precision value: [0.82758621 0.81818182 0.87096774 0.92857143 0.79310345 0.9 0.81818182 0.78125 0.83870968 0.80645161] mean value: 0.8383003752365543 key: train_precision value: [0.86131387 0.86131387 0.87084871 0.85198556 0.85869565 0.86891386 0.85559567 0.8649635 0.85304659 0.87453875] mean value: 0.8621216027021169 key: test_recall value: [0.82758621 0.93103448 0.93103448 0.89655172 0.79310345 0.96428571 0.96428571 0.89285714 0.92857143 0.89285714] mean value: 0.9022167487684729 key: train_recall value: [0.921875 0.921875 0.921875 0.921875 0.92578125 0.90272374 0.92217899 0.92217899 0.92607004 0.92217899] mean value: 0.9208611989299611 key: test_roc_auc value: [0.74712644 0.75123153 0.8226601 0.87684729 0.68226601 0.88214286 0.78214286 0.71309524 0.79761905 0.74642857] mean value: 0.7801559934318555 key: train_roc_auc value: [0.81589933 0.81699811 0.82836174 0.80563447 0.81516335 0.81777408 0.80841774 0.81986812 0.80654647 0.8313185 ] mean value: 0.8165981914150152 key: test_jcc value: [0.70588235 0.77142857 0.81818182 0.83870968 0.65714286 0.87096774 0.79411765 0.71428571 0.78787879 0.73529412] mean value: 0.7693889285919646 key: train_jcc value: [0.80272109 0.80272109 0.81099656 0.79461279 0.80338983 0.79452055 0.7979798 0.80612245 0.79865772 0.81443299] mean value: 0.8026154868282023 MCC on Blind test: 0.52 Accuracy on Blind test: 0.78 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.25924444 0.05974245 0.06969905 0.06822252 0.05877757 0.06371927 0.06096601 0.07231355 0.07287455 0.25041699] mean value: 0.10359764099121094 key: score_time value: [0.01188898 0.01114655 0.010535 0.01076055 0.01073456 0.01050711 0.01052427 0.01171732 0.01074386 0.01410055] mean value: 0.011265873908996582 key: test_mcc value: [0.84691397 1. 0.94742759 0.89408867 0.94928891 0.89761905 0.8993825 0.89761905 1. 0.8993825 ] mean value: 0.9231722243931844 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93181818 1. 0.97674419 0.95348837 0.97674419 0.95348837 0.95348837 0.95348837 1. 0.95348837] mean value: 0.9652748414376321 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94915254 1. 0.98305085 0.96551724 0.98245614 0.96428571 0.96551724 0.96428571 1. 0.96551724] mean value: 0.9739782682890745 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93333333 1. 0.96666667 0.96551724 1. 0.96428571 0.93333333 0.96428571 1. 0.93333333] mean value: 0.9660755336617406 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 1. 1. 0.96551724 0.96551724 0.96428571 1. 0.96428571 1. 1. ] mean value: 0.982512315270936 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91609195 1. 0.96428571 0.94704433 0.98275862 0.94880952 0.93333333 0.94880952 1. 0.93333333] mean value: 0.9574466338259442 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90322581 1. 0.96666667 0.93333333 0.96551724 0.93103448 0.93333333 0.93103448 1. 0.93333333] mean value: 0.9497478680014831 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0659194 0.04976654 0.11305714 0.05395603 0.12562203 0.07698584 0.08231831 0.0750525 0.04251218 0.07967854] mean value: 0.07648684978485107 key: score_time value: [0.03880215 0.02202821 0.02107215 0.01819706 0.02468801 0.02061653 0.02234888 0.01253152 0.02122045 0.02172065] mean value: 0.022322559356689455 key: test_mcc value: [0.69655172 0.84515772 0.84515772 0.78817734 0.84515772 0.89761905 0.94928891 0.63689536 1. 0.69285714] mean value: 0.8196862694382097 key: train_mcc value: [0.97111276 0.96555771 0.95973448 0.97128177 0.97712771 0.95958106 0.97117251 0.9769295 0.9653608 0.96542571] mean value: 0.9683284032432156 key: test_accuracy value: [0.86363636 0.93023256 0.93023256 0.90697674 0.93023256 0.95348837 0.97674419 0.8372093 1. 0.86046512] mean value: 0.9189217758985201 key: train_accuracy value: [0.9870801 0.98453608 0.98195876 0.9871134 0.98969072 0.98195876 0.9871134 0.98969072 0.98453608 0.98453608] mean value: 0.985821412397773 key: test_fscore value: [0.89655172 0.94736842 0.94736842 0.93103448 0.94736842 0.96428571 0.98245614 0.87719298 1. 0.89285714] mean value: 0.9386483450004321 key: train_fscore value: [0.99025341 0.98837209 0.98640777 0.99029126 0.99224806 0.98646035 0.99032882 0.99224806 0.98837209 0.98841699] mean value: 0.9893398907205294 key: test_precision value: [0.89655172 0.96428571 0.96428571 0.93103448 0.96428571 0.96428571 0.96551724 0.86206897 1. 0.89285714] mean value: 0.9405172413793104 key: train_precision value: [0.98832685 0.98076923 0.98069498 0.98455598 0.98461538 0.98076923 0.98461538 0.98841699 0.98455598 0.98084291] mean value: 0.9838162929119592 key: test_recall value: [0.89655172 0.93103448 0.93103448 0.93103448 0.93103448 0.96428571 1. 0.89285714 1. 0.89285714] mean value: 0.9370689655172414 key: train_recall value: [0.9921875 0.99609375 0.9921875 0.99609375 1. 0.9922179 0.99610895 0.99610895 0.9922179 0.99610895] mean value: 0.9949325145914397 key: test_roc_auc value: [0.84827586 0.92980296 0.92980296 0.89408867 0.92980296 0.94880952 0.96666667 0.81309524 1. 0.84642857] mean value: 0.9106773399014779 key: train_roc_auc value: [0.98464337 0.97910748 0.97715436 0.98289536 0.98484848 0.97702498 0.9827873 0.98660409 0.98084177 0.97897051] mean value: 0.9814877701340265 key: test_jcc value: [0.8125 0.9 0.9 0.87096774 0.9 0.93103448 0.96551724 0.78125 1. 0.80645161] mean value: 0.886772107897664 key: train_jcc value: [0.98069498 0.97701149 0.97318008 0.98076923 0.98461538 0.97328244 0.98084291 0.98461538 0.97701149 0.97709924] mean value: 0.9789122637095788 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02146697 0.01106024 0.00995302 0.0117507 0.01304126 0.01269269 0.01129675 0.01066136 0.01041627 0.01036024] mean value: 0.012269949913024903 key: score_time value: [0.01147509 0.01147795 0.01027989 0.00955701 0.00969648 0.010355 0.00968647 0.01100302 0.00903153 0.00946951] mean value: 0.010203194618225098 key: test_mcc value: [0.48006374 0.63464776 0.61634173 0.84383267 0.38115218 0.74102654 0.63660014 0.4630445 0.7952381 0.63247577] mean value: 0.6224423130557006 key: train_mcc value: [0.65864868 0.65370117 0.62501937 0.62426504 0.63554065 0.63458211 0.64594126 0.64794192 0.63340179 0.66958019] mean value: 0.6428622185866101 key: test_accuracy value: [0.77272727 0.8372093 0.8372093 0.93023256 0.69767442 0.88372093 0.8372093 0.76744186 0.90697674 0.8372093 ] mean value: 0.8307610993657506 key: train_accuracy value: [0.8501292 0.84793814 0.83505155 0.83505155 0.84020619 0.84020619 0.84536082 0.84536082 0.84020619 0.8556701 ] mean value: 0.8435180745358161 key: test_fscore value: [0.83333333 0.89230769 0.8852459 0.95081967 0.75471698 0.9122807 0.8852459 0.83870968 0.92857143 0.88135593] mean value: 0.8762587222131497 key: train_fscore value: [0.88973384 0.88846881 0.878327 0.87878788 0.88301887 0.88301887 0.88721805 0.88593156 0.88389513 0.89513109] mean value: 0.8853531081489168 key: test_precision value: [0.80645161 0.80555556 0.84375 0.90625 0.83333333 0.89655172 0.81818182 0.76470588 0.92857143 0.83870968] mean value: 0.8442061032455589 key: train_precision value: [0.86666667 0.86080586 0.85555556 0.85294118 0.8540146 0.85714286 0.85818182 0.866171 0.85198556 0.86281588] mean value: 0.8586280981124286 key: test_recall value: [0.86206897 1. 0.93103448 1. 0.68965517 0.92857143 0.96428571 0.92857143 0.92857143 0.92857143] mean value: 0.9161330049261084 key: train_recall value: [0.9140625 0.91796875 0.90234375 0.90625 0.9140625 0.91050584 0.91828794 0.90661479 0.91828794 0.92996109] mean value: 0.9138345087548638 key: test_roc_auc value: [0.73103448 0.75 0.78694581 0.89285714 0.70197044 0.86428571 0.78214286 0.69761905 0.89761905 0.79761905] mean value: 0.7902093596059113 key: train_roc_auc value: [0.81962667 0.81504498 0.8034446 0.80160985 0.8055161 0.80639796 0.81028901 0.81590281 0.80265542 0.81994238] mean value: 0.8100429772550632 key: test_jcc value: [0.71428571 0.80555556 0.79411765 0.90625 0.60606061 0.83870968 0.79411765 0.72222222 0.86666667 0.78787879] mean value: 0.7835864524206555 key: train_jcc value: [0.80136986 0.79931973 0.78305085 0.78378378 0.79054054 0.79054054 0.7972973 0.79522184 0.79194631 0.81016949] mean value: 0.7943240243778313 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01843333 0.02376246 0.02306414 0.02690268 0.0255127 0.01901436 0.02600908 0.02429175 0.02094388 0.02146673] mean value: 0.02294011116027832 key: score_time value: [0.00926924 0.01239872 0.01190901 0.01186895 0.01200056 0.01189756 0.01200986 0.01208138 0.01191568 0.0124774 ] mean value: 0.0117828369140625 key: test_mcc value: [0.80277297 0.81883947 0.84383267 0.79990777 0.84515772 0.85004744 0.74102654 0.63660014 0.79313677 0.84515772] mean value: 0.7976479225553117 key: train_mcc value: [0.9653815 0.83710153 0.85834721 0.9713635 0.96575651 0.80728276 0.98276159 0.94253379 0.82968701 0.94253379] mean value: 0.9102749176546568 key: test_accuracy value: [0.90909091 0.90697674 0.93023256 0.90697674 0.93023256 0.93023256 0.88372093 0.8372093 0.90697674 0.93023256] mean value: 0.9071881606765327 key: train_accuracy value: [0.98449612 0.91752577 0.93556701 0.9871134 0.98453608 0.91237113 0.99226804 0.9742268 0.92268041 0.9742268 ] mean value: 0.9585011587948533 key: test_fscore value: [0.93548387 0.92592593 0.95081967 0.92857143 0.94736842 0.94915254 0.9122807 0.8852459 0.93103448 0.94736842] mean value: 0.9313251368226739 key: train_fscore value: [0.98837209 0.93360996 0.95327103 0.99021526 0.98841699 0.93772894 0.99415205 0.98084291 0.94464945 0.98084291] mean value: 0.9692101586933536 key: test_precision value: [0.87878788 1. 0.90625 0.96296296 0.96428571 0.90322581 0.89655172 0.81818182 0.9 0.93103448] mean value: 0.9161280387566539 key: train_precision value: [0.98076923 0.99557522 0.91397849 0.99215686 0.97709924 0.88581315 0.99609375 0.96603774 0.89824561 0.96603774] mean value: 0.9571807030540272 key: test_recall value: [1. 0.86206897 1. 0.89655172 0.93103448 1. 0.92857143 0.96428571 0.96428571 0.96428571] mean value: 0.9511083743842365 key: train_recall value: [0.99609375 0.87890625 0.99609375 0.98828125 1. 0.99610895 0.9922179 0.99610895 0.99610895 0.99610895] mean value: 0.9836028696498055 key: test_roc_auc value: [0.86666667 0.93103448 0.89285714 0.91256158 0.92980296 0.9 0.86428571 0.78214286 0.88214286 0.91547619] mean value: 0.8876970443349754 key: train_roc_auc value: [0.97896291 0.93566525 0.90713778 0.98656487 0.97727273 0.87210028 0.99229216 0.96370333 0.88736745 0.96370333] mean value: 0.9464770073439868 key: test_jcc value: [0.87878788 0.86206897 0.90625 0.86666667 0.9 0.90322581 0.83870968 0.79411765 0.87096774 0.9 ] mean value: 0.8720794383837062 key: train_jcc value: [0.97701149 0.87548638 0.91071429 0.98062016 0.97709924 0.88275862 0.98837209 0.96240602 0.8951049 0.96240602] mean value: 0.9411979191863091 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01731038 0.02026916 0.01957202 0.03974581 0.02020097 0.02520275 0.01930141 0.02426219 0.01893759 0.02047873] mean value: 0.02252810001373291 key: score_time value: [0.01059461 0.01298666 0.02482533 0.0290792 0.01322126 0.01507759 0.01336336 0.01371551 0.01302266 0.01305866] mean value: 0.015894484519958497 key: test_mcc value: [0.55250625 0.67480294 0.84383267 0.79990777 0.62324149 0.80104099 0.7952381 0.68920734 0.86258195 0.75210143] mean value: 0.73944609392203 key: train_mcc value: [0.5134357 0.86489996 0.92010222 0.93185204 0.90318906 0.80728276 0.91707893 0.89149349 0.95379209 0.75859788] mean value: 0.8461724129400746 key: test_accuracy value: [0.79545455 0.86046512 0.93023256 0.90697674 0.8372093 0.90697674 0.90697674 0.86046512 0.93023256 0.88372093] mean value: 0.8818710359408034 key: train_accuracy value: [0.78036176 0.93814433 0.96391753 0.96907216 0.95618557 0.91237113 0.96134021 0.95103093 0.97938144 0.88917526] mean value: 0.9300980313806974 key: test_fscore value: [0.86567164 0.90322581 0.95081967 0.92857143 0.88135593 0.93333333 0.92857143 0.9 0.94339623 0.91803279] mean value: 0.9152978256353725 key: train_fscore value: [0.85762144 0.95522388 0.97328244 0.97637795 0.96774194 0.93772894 0.97017893 0.96421846 0.98449612 0.92280072] mean value: 0.9509670814198928 key: test_precision value: [0.76315789 0.84848485 0.90625 0.96296296 0.86666667 0.875 0.92857143 0.84375 1. 0.84848485] mean value: 0.8843328649907597 key: train_precision value: [0.75073314 0.91428571 0.95149254 0.98412698 0.94095941 0.88581315 0.99186992 0.93430657 0.98069498 0.85666667] mean value: 0.9190949067342966 key: test_recall value: [1. 0.96551724 1. 0.89655172 0.89655172 1. 0.92857143 0.96428571 0.89285714 1. ] mean value: 0.9544334975369458 key: train_recall value: [1. 1. 0.99609375 0.96875 0.99609375 0.99610895 0.94941634 0.99610895 0.98832685 1. ] mean value: 0.9890898589494164 key: test_roc_auc value: [0.7 0.80418719 0.89285714 0.91256158 0.80541872 0.86666667 0.89761905 0.81547619 0.94642857 0.83333333] mean value: 0.8474548440065681 key: train_roc_auc value: [0.67557252 0.90909091 0.94880445 0.96922348 0.93744081 0.87210028 0.96707458 0.92935218 0.97507945 0.83587786] mean value: 0.9019616539715853 key: test_jcc value: [0.76315789 0.82352941 0.90625 0.86666667 0.78787879 0.875 0.86666667 0.81818182 0.89285714 0.84848485] mean value: 0.8448673237237478 key: train_jcc value: [0.75073314 0.91428571 0.94795539 0.95384615 0.9375 0.88275862 0.94208494 0.93090909 0.96946565 0.85666667] mean value: 0.908620536550167 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18523073 0.16147161 0.16436124 0.15916276 0.16037703 0.16199589 0.15805674 0.15851617 0.15941811 0.16083217] mean value: 0.1629422426223755 key: score_time value: [0.01647878 0.01538658 0.01684141 0.01527929 0.01533222 0.01516366 0.01532793 0.01527905 0.01649475 0.01618528] mean value: 0.015776896476745607 key: test_mcc value: [0.84691397 1. 0.94742759 0.89545704 0.89408867 0.94928891 0.8993825 0.89761905 1. 0.8993825 ] mean value: 0.9229560240493224 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93181818 1. 0.97674419 0.95348837 0.95348837 0.97674419 0.95348837 0.95348837 1. 0.95348837] mean value: 0.9652748414376321 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94915254 1. 0.98305085 0.96666667 0.96551724 0.98245614 0.96551724 0.96428571 1. 0.96551724] mean value: 0.9742163635271698 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93333333 1. 0.96666667 0.93548387 0.96551724 0.96551724 0.93333333 0.96428571 1. 0.93333333] mean value: 0.9597470734678744 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 1. 1. 1. 0.96551724 1. 1. 0.96428571 1. 1. ] mean value: 0.9895320197044335 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91609195 1. 0.96428571 0.92857143 0.94704433 0.96666667 0.93333333 0.94880952 1. 0.93333333] mean value: 0.9538136288998358 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90322581 1. 0.96666667 0.93548387 0.93333333 0.96551724 0.93333333 0.93103448 1. 0.93333333] mean value: 0.9501928068223953 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0493753 0.0488708 0.05318427 0.08073163 0.08004856 0.07434177 0.05143976 0.06507373 0.07488728 0.07277703] mean value: 0.06507301330566406 key: score_time value: [0.0248158 0.01857805 0.02386665 0.03749061 0.02310753 0.03106332 0.020509 0.02412271 0.02441359 0.03884387] mean value: 0.026681113243103027 key: test_mcc value: [0.84691397 0.94928891 0.94742759 0.89408867 0.9025825 0.84984956 0.8993825 0.89761905 1. 0.8993825 ] mean value: 0.9086535254177944 key: train_mcc value: [0.98856835 0.98276159 0.99426489 0.99426489 0.98851799 0.98276159 1. 0.99426489 0.99424345 0.98849821] mean value: 0.9908145839110009 key: test_accuracy value: [0.93181818 0.97674419 0.97674419 0.95348837 0.95348837 0.93023256 0.95348837 0.95348837 1. 0.95348837] mean value: 0.9582980972515857 key: train_accuracy value: [0.99483204 0.99226804 0.99742268 0.99742268 0.99484536 0.99226804 1. 0.99742268 0.99742268 0.99484536] mean value: 0.9958749567116865 key: test_fscore value: [0.94915254 0.98245614 0.98305085 0.96551724 0.96428571 0.94545455 0.96551724 0.96428571 1. 0.96551724] mean value: 0.9685237228345291 key: train_fscore value: [0.99607843 0.99415205 0.99805068 0.99805068 0.99609375 0.99415205 1. 0.99805068 0.99805825 0.99612403] mean value: 0.9968810605158362 key: test_precision value: [0.93333333 1. 0.96666667 0.96551724 1. 0.96296296 0.93333333 0.96428571 1. 0.93333333] mean value: 0.9659432585294654 key: train_precision value: [1. 0.9922179 0.99610895 0.99610895 0.99609375 0.99609375 1. 1. 0.99612403 0.99227799] mean value: 0.9965025320951114 key: test_recall value: [0.96551724 0.96551724 1. 0.96551724 0.93103448 0.92857143 1. 0.96428571 1. 1. ] mean value: 0.9720443349753695 key: train_recall value: [0.9921875 0.99609375 1. 1. 0.99609375 0.9922179 1. 0.99610895 1. 1. ] mean value: 0.9972701848249027 key: test_roc_auc value: [0.91609195 0.98275862 0.96428571 0.94704433 0.96551724 0.93095238 0.93333333 0.94880952 1. 0.93333333] mean value: 0.9522126436781609 key: train_roc_auc value: [0.99609375 0.99047112 0.99621212 0.99621212 0.994259 0.99229216 1. 0.99805447 0.99618321 0.99236641] mean value: 0.9952144354612601 key: test_jcc value: [0.90322581 0.96551724 0.96666667 0.93333333 0.93103448 0.89655172 0.93333333 0.93103448 1. 0.93333333] mean value: 0.9394030404152762 key: train_jcc value: [0.9921875 0.98837209 0.99610895 0.99610895 0.9922179 0.98837209 1. 0.99610895 0.99612403 0.99227799] mean value: 0.9937878456413968 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.10673809 0.12460399 0.13710308 0.20830345 0.13556242 0.12027812 0.12287211 0.12001324 0.12852883 0.11976194] mean value: 0.1323765277862549 key: score_time value: [0.03763318 0.02303171 0.02376223 0.02338004 0.02684474 0.02358699 0.02333927 0.0231626 0.02329302 0.02722192] mean value: 0.025525569915771484 key: test_mcc value: [0.34678431 0.36578221 0.61849012 0.68226601 0.49649436 0.68689103 0.27702563 0.26854231 0.63660014 0.5773737 ] mean value: 0.4956249811050279 key: train_mcc value: [0.94850869 0.96008603 0.97143696 0.9544252 0.96008603 0.94290506 0.94290506 0.94290506 0.94857137 0.96562399] mean value: 0.9537453449502975 key: test_accuracy value: [0.72727273 0.74418605 0.8372093 0.86046512 0.79069767 0.86046512 0.69767442 0.69767442 0.8372093 0.81395349] mean value: 0.7866807610993658 key: train_accuracy value: [0.97674419 0.98195876 0.9871134 0.97938144 0.98195876 0.9742268 0.9742268 0.9742268 0.97680412 0.98453608] mean value: 0.9791177175737233 key: test_fscore value: [0.82352941 0.83076923 0.88888889 0.89655172 0.85714286 0.89655172 0.79365079 0.8 0.8852459 0.86666667] mean value: 0.8538997198798349 key: train_fscore value: [0.98272553 0.98651252 0.99032882 0.98461538 0.98651252 0.98091603 0.98091603 0.98091603 0.98279159 0.98846154] mean value: 0.984469599779477 key: test_precision value: [0.71794872 0.75 0.82352941 0.89655172 0.79411765 0.86666667 0.71428571 0.7027027 0.81818182 0.8125 ] mean value: 0.789648440274708 key: train_precision value: [0.96603774 0.97338403 0.98084291 0.96969697 0.97338403 0.96254682 0.96254682 0.96254682 0.96616541 0.97718631] mean value: 0.9694337853019032 key: test_recall value: [0.96551724 0.93103448 0.96551724 0.89655172 0.93103448 0.92857143 0.89285714 0.92857143 0.96428571 0.92857143] mean value: 0.9332512315270937 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.61609195 0.64408867 0.76847291 0.841133 0.71551724 0.83095238 0.61309524 0.59761905 0.78214286 0.76428571] mean value: 0.7173399014778326 key: train_roc_auc value: [0.96564885 0.97348485 0.98106061 0.96969697 0.97348485 0.96183206 0.96183206 0.96183206 0.96564885 0.97709924] mean value: 0.9691620402498266 key: test_jcc value: [0.7 0.71052632 0.8 0.8125 0.75 0.8125 0.65789474 0.66666667 0.79411765 0.76470588] mean value: 0.7468911248710011 key: train_jcc value: [0.96603774 0.97338403 0.98084291 0.96969697 0.97338403 0.96254682 0.96254682 0.96254682 0.96616541 0.97718631] mean value: 0.9694337853019032 MCC on Blind test: 0.27 Accuracy on Blind test: 0.69 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.61192012 0.59444022 0.59235024 0.59150219 0.58671045 0.59856415 0.60128736 0.59110236 0.59164453 0.59326339] mean value: 0.5952785015106201 key: score_time value: [0.0099473 0.00931048 0.00948477 0.00931215 0.0097363 0.00960946 0.0094378 0.00950861 0.00967026 0.0094254 ] mean value: 0.0095442533493042 key: test_mcc value: [0.84691397 0.94928891 1. 0.84515772 0.89408867 0.89761905 0.8993825 0.89761905 1. 0.8993825 ] mean value: 0.9129452373166119 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93181818 0.97674419 1. 0.93023256 0.95348837 0.95348837 0.95348837 0.95348837 1. 0.95348837] mean value: 0.9606236786469344 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94915254 0.98245614 1. 0.94736842 0.96551724 0.96428571 0.96551724 0.96428571 1. 0.96551724] mean value: 0.970410025648575 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93333333 1. 1. 0.96428571 0.96551724 0.96428571 0.93333333 0.96428571 1. 0.93333333] mean value: 0.9658374384236453 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.96551724 1. 0.93103448 0.96551724 0.96428571 1. 0.96428571 1. 1. ] mean value: 0.975615763546798 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91609195 0.98275862 1. 0.92980296 0.94704433 0.94880952 0.93333333 0.94880952 1. 0.93333333] mean value: 0.9539983579638752 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90322581 0.96551724 1. 0.9 0.93333333 0.93103448 0.93333333 0.93103448 1. 0.93333333] mean value: 0.9430812013348164 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02912045 0.02907252 0.02943158 0.02924204 0.03977633 0.04929161 0.04781103 0.05106616 0.04255533 0.02873945] mean value: 0.037610650062561035 key: score_time value: [0.01276016 0.01829052 0.01310325 0.01456141 0.01490211 0.01842403 0.01963758 0.01851559 0.0146873 0.01568079] mean value: 0.016056275367736815 key: test_mcc value: [ 0.0446356 0.1779546 0.29311846 0.0045305 0.1993421 0.02624453 0.07005059 0.15163508 0.1015749 -0.20044593] mean value: 0.08686404365068628 key: train_mcc value: [0.3513966 0.34143168 0.31600695 0.34143168 0.33312685 0.34340112 0.34340112 0.36757639 0.35159962 0.35965479] mean value: 0.34490268018852405 key: test_accuracy value: [0.63636364 0.6744186 0.72093023 0.65116279 0.69767442 0.60465116 0.65116279 0.65116279 0.65116279 0.58139535] mean value: 0.6520084566596195 key: train_accuracy value: [0.72093023 0.71649485 0.70876289 0.71649485 0.71391753 0.71907216 0.71907216 0.72680412 0.72164948 0.7242268 ] mean value: 0.7187425077918964 key: test_fscore value: [0.76470588 0.78125 0.81818182 0.7826087 0.8115942 0.73015873 0.7826087 0.76190476 0.7761194 0.73529412] mean value: 0.7744426307433283 key: train_fscore value: [0.82580645 0.82315113 0.8192 0.82315113 0.82182986 0.82504013 0.82504013 0.82903226 0.82636656 0.82769726] mean value: 0.824631489480623 key: test_precision value: [0.66666667 0.71428571 0.72972973 0.675 0.7 0.65714286 0.65853659 0.68571429 0.66666667 0.625 ] mean value: 0.6778742505571774 key: train_precision value: [0.7032967 0.69945355 0.69376694 0.69945355 0.69754768 0.70218579 0.70218579 0.70798898 0.70410959 0.70604396] mean value: 0.7016032539215682 key: test_recall value: [0.89655172 0.86206897 0.93103448 0.93103448 0.96551724 0.82142857 0.96428571 0.85714286 0.92857143 0.89285714] mean value: 0.9050492610837438 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.51494253 0.57389163 0.60837438 0.50123153 0.55418719 0.51071429 0.51547619 0.56190476 0.53095238 0.44642857] mean value: 0.5318103448275862 key: train_roc_auc value: [0.58778626 0.58333333 0.5719697 0.58333333 0.57954545 0.58396947 0.58396947 0.59541985 0.58778626 0.59160305] mean value: 0.5848716169326856 key: test_jcc value: [0.61904762 0.64102564 0.69230769 0.64285714 0.68292683 0.575 0.64285714 0.61538462 0.63414634 0.58139535] mean value: 0.6326948373048771 key: train_jcc value: [0.7032967 0.69945355 0.69376694 0.69945355 0.69754768 0.70218579 0.70218579 0.70798898 0.70410959 0.70604396] mean value: 0.7016032539215682 MCC on Blind test: -0.21 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03475094 0.03894949 0.03813624 0.03801632 0.03756666 0.03808069 0.03824639 0.03827929 0.03780437 0.03802657] mean value: 0.037785696983337405 key: score_time value: [0.02142906 0.02273035 0.02185512 0.02089429 0.02069378 0.02342296 0.02245855 0.02128291 0.02075934 0.0240798 ] mean value: 0.02196061611175537 key: test_mcc value: [0.85146932 0.94928891 0.89408867 0.89408867 0.73130353 0.94928891 0.80104099 0.80104099 0.94928891 0.84515772] mean value: 0.8666056604496469 key: train_mcc value: [0.93647644 0.94844498 0.93680394 0.95413965 0.9544252 0.95381103 0.95968877 0.93655577 0.94253379 0.94824314] mean value: 0.9471122705214379 key: test_accuracy value: [0.93181818 0.97674419 0.95348837 0.95348837 0.88372093 0.97674419 0.90697674 0.90697674 0.97674419 0.93023256] mean value: 0.9396934460887949 key: train_accuracy value: [0.97157623 0.97680412 0.97164948 0.97938144 0.97938144 0.97938144 0.98195876 0.97164948 0.9742268 0.97680412] mean value: 0.9762813340792242 key: test_fscore value: [0.95081967 0.98245614 0.96551724 0.96551724 0.91525424 0.98245614 0.93333333 0.93333333 0.98245614 0.94736842] mean value: 0.9558511900949833 key: train_fscore value: [0.97880539 0.98265896 0.97880539 0.98455598 0.98461538 0.98455598 0.98651252 0.97888676 0.98084291 0.98272553] mean value: 0.982296482327693 key: test_precision value: [0.90625 1. 0.96551724 0.96551724 0.9 0.96551724 0.875 0.875 0.96551724 0.93103448] mean value: 0.9349353448275862 key: train_precision value: [0.96577947 0.96958175 0.96577947 0.97328244 0.96969697 0.97701149 0.97709924 0.96590909 0.96603774 0.96969697] mean value: 0.969987462420492 key: test_recall value: [1. 0.96551724 0.96551724 0.96551724 0.93103448 1. 1. 1. 1. 0.96428571] mean value: 0.9791871921182266 key: train_recall value: [0.9921875 0.99609375 0.9921875 0.99609375 1. 0.9922179 0.99610895 0.9922179 0.99610895 0.99610895] mean value: 0.9949325145914397 key: test_roc_auc value: [0.9 0.98275862 0.94704433 0.94704433 0.85837438 0.96666667 0.86666667 0.86666667 0.96666667 0.91547619] mean value: 0.9217364532019705 key: train_roc_auc value: [0.9617426 0.96774384 0.96200284 0.97153172 0.96969697 0.97320819 0.97515371 0.9617578 0.96370333 0.96752012] mean value: 0.9674061138767979 key: test_jcc value: [0.90625 0.96551724 0.93333333 0.93333333 0.84375 0.96551724 0.875 0.875 0.96551724 0.9 ] mean value: 0.9163218390804598 key: train_jcc value: [0.95849057 0.96590909 0.95849057 0.96958175 0.96969697 0.96958175 0.97338403 0.95864662 0.96240602 0.96603774] mean value: 0.9652225088626647 MCC on Blind test: 0.88 Accuracy on Blind test: 0.94 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.22743607 0.27104139 0.27143574 0.37188888 0.2696569 0.27980232 0.29186916 0.26531053 0.31099105 0.19231105] mean value: 0.27517430782318114 key: score_time value: [0.02266073 0.02105594 0.04440308 0.01608038 0.01722169 0.02272081 0.02047729 0.02023792 0.01908875 0.01224136] mean value: 0.02161879539489746 key: test_mcc value: [0.79532948 0.94928891 0.89408867 0.89408867 0.73130353 0.94928891 0.80104099 0.63247577 0.94928891 0.84984956] mean value: 0.8446043383090877 key: train_mcc value: [0.96531572 0.95984378 0.93680394 0.95413965 0.9544252 0.95381103 0.95968877 0.95958106 0.94253379 0.95381103] mean value: 0.9539953964174639 key: test_accuracy value: [0.90909091 0.97674419 0.95348837 0.95348837 0.88372093 0.97674419 0.90697674 0.8372093 0.97674419 0.93023256] mean value: 0.9304439746300212 key: train_accuracy value: [0.98449612 0.98195876 0.97164948 0.97938144 0.97938144 0.97938144 0.98195876 0.98195876 0.9742268 0.97938144] mean value: 0.9793774474546472 key: test_fscore value: [0.93333333 0.98245614 0.96551724 0.96551724 0.91525424 0.98245614 0.93333333 0.88135593 0.98245614 0.94545455] mean value: 0.948713428542399 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:107: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:110: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.98832685 0.98646035 0.97880539 0.98455598 0.98461538 0.98455598 0.98651252 0.98646035 0.98084291 0.98455598] mean value: 0.9845691713809857 key: test_precision value: [0.90322581 1. 0.96551724 0.96551724 0.9 0.96551724 0.875 0.83870968 0.96551724 0.96296296] mean value: 0.9341967412351172 key: train_precision value: [0.98449612 0.97701149 0.96577947 0.97328244 0.96969697 0.97701149 0.97709924 0.98076923 0.96603774 0.97701149] mean value: 0.9748195690174807 key: test_recall value: [0.96551724 0.96551724 0.96551724 0.96551724 0.93103448 1. 1. 0.92857143 1. 0.92857143] mean value: 0.9650246305418719 key: train_recall value: [0.9921875 0.99609375 0.9921875 0.99609375 1. 0.9922179 0.99610895 0.9922179 0.99610895 0.9922179 ] mean value: 0.994543409533074 key: test_roc_auc value: [0.88275862 0.98275862 0.94704433 0.94704433 0.85837438 0.96666667 0.86666667 0.79761905 0.96666667 0.93095238] mean value: 0.9146551724137931 key: train_roc_auc value: [0.98082657 0.9753196 0.96200284 0.97153172 0.96969697 0.97320819 0.97515371 0.97702498 0.96370333 0.97320819] mean value: 0.9721676103876334 key: test_jcc value: [0.875 0.96551724 0.93333333 0.93333333 0.84375 0.96551724 0.875 0.78787879 0.96551724 0.89655172] mean value: 0.9041398902821317 key: train_jcc value: [0.97692308 0.97328244 0.95849057 0.96958175 0.96969697 0.96958175 0.97338403 0.97328244 0.96240602 0.96958175] mean value: 0.96962107907581 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04111385 0.03918624 0.03823042 0.04130435 0.03554368 0.03858113 0.03941131 0.03819084 0.03714442 0.03736854] mean value: 0.03860747814178467 key: score_time value: [0.01241827 0.01450682 0.01431775 0.01575279 0.01216865 0.01636386 0.01581478 0.01859069 0.01580238 0.01748109] mean value: 0.015321707725524903 key: test_mcc value: [0.8953202 0.92980296 0.78940887 0.8615634 0.65634573 0.89988258 0.8953202 0.82490815 0.86189955 0.86189955] mean value: 0.8476351166112527 key: train_mcc value: [0.89480004 0.9025977 0.89491047 0.89108657 0.90309643 0.90259326 0.88726363 0.89126481 0.88699724 0.90667624] mean value: 0.8961286395835659 key: test_accuracy value: [0.94736842 0.96491228 0.89473684 0.92982456 0.8245614 0.94736842 0.94736842 0.9122807 0.92982456 0.92982456] mean value: 0.9228070175438596 key: train_accuracy value: [0.94736842 0.95126706 0.94736842 0.9454191 0.95126706 0.95126706 0.94346979 0.9454191 0.94346979 0.95321637] mean value: 0.947953216374269 key: test_fscore value: [0.94736842 0.96551724 0.89655172 0.93333333 0.81481481 0.94915254 0.94736842 0.90909091 0.93103448 0.93103448] mean value: 0.9225266372751685 key: train_fscore value: [0.94757282 0.95145631 0.94777563 0.94594595 0.95201536 0.9516441 0.94433781 0.94636015 0.94390716 0.95384615] mean value: 0.9484861432129039 key: test_precision value: [0.96428571 0.96551724 0.89655172 0.90322581 0.88 0.90322581 0.93103448 0.92592593 0.9 0.9 ] mean value: 0.9169766701390728 key: train_precision value: [0.94208494 0.94594595 0.93869732 0.9351145 0.93584906 0.94615385 0.93181818 0.93207547 0.93846154 0.94296578] mean value: 0.9389166584058478 key: test_recall value: [0.93103448 0.96551724 0.89655172 0.96551724 0.75862069 1. 0.96428571 0.89285714 0.96428571 0.96428571] mean value: 0.930295566502463 key: train_recall value: [0.953125 0.95703125 0.95703125 0.95703125 0.96875 0.95719844 0.95719844 0.96108949 0.94941634 0.96498054] mean value: 0.9582852018482491 key: test_roc_auc value: [0.9476601 0.96490148 0.89470443 0.92918719 0.82573892 0.94827586 0.9476601 0.91194581 0.93041872 0.93041872] mean value: 0.9230911330049262 key: train_roc_auc value: [0.94737962 0.95127827 0.94738722 0.9454417 0.95130107 0.95125547 0.94344297 0.9453885 0.94345817 0.9531934 ] mean value: 0.9479526386186771 key: test_jcc value: [0.9 0.93333333 0.8125 0.875 0.6875 0.90322581 0.9 0.83333333 0.87096774 0.87096774] mean value: 0.8586827956989247 key: train_jcc value: [0.900369 0.90740741 0.90073529 0.8974359 0.90842491 0.90774908 0.89454545 0.89818182 0.89377289 0.91176471] mean value: 0.9020386460949191 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.85042238 1.0178709 0.85435915 0.93712735 0.84169865 0.8663528 1.05083084 0.91208911 1.01056862 0.870682 ] mean value: 0.921200180053711 key: score_time value: [0.01473236 0.01605916 0.01670599 0.01941013 0.01508021 0.01618195 0.01466751 0.01649761 0.0150547 0.01528406] mean value: 0.015967369079589844 key: test_mcc value: [0.96551724 0.8953202 0.85960591 0.85960591 0.8953202 0.86189955 0.8951918 0.92980296 0.8951918 0.86789789] mean value: 0.8925353457793551 key: train_mcc value: [1. 0.98837192 0.98831165 1. 0.98831165 1. 0.98443509 0.98443509 1. 1. ] mean value: 0.9933865392883063 key: test_accuracy value: [0.98245614 0.94736842 0.92982456 0.92982456 0.94736842 0.92982456 0.94736842 0.96491228 0.94736842 0.92982456] mean value: 0.9456140350877192 key: train_accuracy value: [1. 0.99415205 0.99415205 1. 0.99415205 1. 0.99220273 0.99220273 1. 1. ] mean value: 0.9966861598440546 key: test_fscore value: [0.98245614 0.94736842 0.93103448 0.93103448 0.94736842 0.93103448 0.94545455 0.96428571 0.94545455 0.92307692] mean value: 0.9448568159003731 key: train_fscore value: [1. 0.99417476 0.99415205 1. 0.99415205 1. 0.99224806 0.99224806 1. 1. ] mean value: 0.9966974974879813 key: test_precision value: [1. 0.96428571 0.93103448 0.93103448 0.96428571 0.9 0.96296296 0.96428571 0.96296296 1. ] mean value: 0.958085203430031 key: train_precision value: [1. 0.98841699 0.9922179 1. 0.9922179 1. 0.98841699 0.98841699 1. 1. ] mean value: 0.9949686762916335 key: test_recall value: [0.96551724 0.93103448 0.93103448 0.93103448 0.93103448 0.96428571 0.92857143 0.96428571 0.92857143 0.85714286] mean value: 0.9332512315270935 key: train_recall value: [1. 1. 0.99609375 1. 0.99609375 1. 0.99610895 0.99610895 1. 1. ] mean value: 0.9984405398832685 key: test_roc_auc value: [0.98275862 0.9476601 0.92980296 0.92980296 0.9476601 0.93041872 0.94704433 0.96490148 0.94704433 0.92857143] mean value: 0.9455665024630543 key: train_roc_auc value: [1. 0.99416342 0.99415582 1. 0.99415582 1. 0.9921951 0.9921951 1. 1. ] mean value: 0.9966865272373541 key: test_jcc value: [0.96551724 0.9 0.87096774 0.87096774 0.9 0.87096774 0.89655172 0.93103448 0.89655172 0.85714286] mean value: 0.8959701255363102 key: train_jcc value: [1. 0.98841699 0.98837209 1. 0.98837209 1. 0.98461538 0.98461538 1. 1. ] mean value: 0.993439194369427 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01497316 0.01095653 0.01136231 0.01050925 0.01016259 0.01015019 0.01007819 0.0104444 0.0105536 0.01053381] mean value: 0.010972404479980468 key: score_time value: [0.01388407 0.00955224 0.00952005 0.00899887 0.00899982 0.00913286 0.00914168 0.00911069 0.00929952 0.00911975] mean value: 0.009675955772399903 key: test_mcc value: [0.65018988 0.8615634 0.622444 0.46490107 0.64901478 0.72242731 0.71921182 0.79778885 0.82512315 0.61805122] mean value: 0.6930715492282729 key: train_mcc value: [0.73363539 0.70504029 0.73152229 0.71648082 0.75803947 0.68918921 0.71149566 0.7453938 0.70358595 0.72724584] mean value: 0.7221628704440908 key: test_accuracy value: [0.8245614 0.92982456 0.80701754 0.71929825 0.8245614 0.84210526 0.85964912 0.89473684 0.9122807 0.80701754] mean value: 0.8421052631578947 key: train_accuracy value: [0.86354776 0.84990253 0.86354776 0.85575049 0.87719298 0.84210526 0.85380117 0.87134503 0.84795322 0.85964912] mean value: 0.8584795321637426 key: test_fscore value: [0.83333333 0.93333333 0.82539683 0.76470588 0.82758621 0.86153846 0.85714286 0.9 0.9122807 0.81355932] mean value: 0.8528876923782588 key: train_fscore value: [0.87179487 0.85819521 0.87037037 0.86346863 0.88268156 0.85137615 0.86136784 0.87686567 0.85869565 0.86956522] mean value: 0.8664381178218032 key: test_precision value: [0.80645161 0.90322581 0.76470588 0.66666667 0.82758621 0.75675676 0.85714286 0.84375 0.89655172 0.77419355] mean value: 0.809703106169564 key: train_precision value: [0.82068966 0.81184669 0.82746479 0.81818182 0.84341637 0.80555556 0.82042254 0.84229391 0.80338983 0.81355932] mean value: 0.8206820472208091 key: test_recall value: [0.86206897 0.96551724 0.89655172 0.89655172 0.82758621 1. 0.85714286 0.96428571 0.92857143 0.85714286] mean value: 0.9055418719211823 key: train_recall value: [0.9296875 0.91015625 0.91796875 0.9140625 0.92578125 0.90272374 0.90661479 0.91439689 0.92217899 0.93385214] mean value: 0.917742278696498 key: test_roc_auc value: [0.82389163 0.92918719 0.80541872 0.716133 0.82450739 0.84482759 0.85960591 0.89593596 0.91256158 0.80788177] mean value: 0.8419950738916256 key: train_roc_auc value: [0.86367643 0.85001976 0.86365364 0.85586393 0.87728751 0.84198687 0.85369802 0.87126094 0.84780824 0.8595042 ] mean value: 0.8584759545233464 key: test_jcc value: [0.71428571 0.875 0.7027027 0.61904762 0.70588235 0.75675676 0.75 0.81818182 0.83870968 0.68571429] mean value: 0.7466280927049428 key: train_jcc value: [0.77272727 0.7516129 0.7704918 0.75974026 0.79 0.74121406 0.75649351 0.7807309 0.75238095 0.76923077] mean value: 0.764462242159521 MCC on Blind test: 0.75 Accuracy on Blind test: 0.89 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0110116 0.01083779 0.01080775 0.01078486 0.01069784 0.01067305 0.01072073 0.01070118 0.01066279 0.01066947] mean value: 0.010756707191467286 key: score_time value: [0.00927234 0.00937843 0.00921917 0.00925756 0.00914478 0.0092566 0.00927472 0.00924993 0.0091548 0.00928521] mean value: 0.009249353408813476 key: test_mcc value: [0.71921182 0.86189955 0.71921182 0.65104858 0.51048128 0.7589669 0.82942474 0.69581469 0.78940887 0.65018988] mean value: 0.7185658130544192 key: train_mcc value: [0.75451908 0.73892092 0.74278722 0.75068043 0.77951916 0.75838325 0.75443104 0.75058523 0.7505054 0.75048638] mean value: 0.7530818106690766 key: test_accuracy value: [0.85964912 0.92982456 0.85964912 0.8245614 0.75438596 0.87719298 0.9122807 0.84210526 0.89473684 0.8245614 ] mean value: 0.8578947368421053 key: train_accuracy value: [0.87719298 0.86939571 0.87134503 0.87524366 0.88888889 0.8791423 0.87719298 0.87524366 0.87524366 0.87524366] mean value: 0.8764132553606238 key: test_fscore value: [0.86206897 0.92857143 0.86206897 0.82142857 0.75 0.88135593 0.91525424 0.85245902 0.89285714 0.81481481] mean value: 0.8580879074591409 key: train_fscore value: [0.87573964 0.8678501 0.87209302 0.87351779 0.89224953 0.87843137 0.87814313 0.8745098 0.87596899 0.87548638] mean value: 0.8763989764320921 key: test_precision value: [0.86206897 0.96296296 0.86206897 0.85185185 0.77777778 0.83870968 0.87096774 0.78787879 0.89285714 0.84615385] mean value: 0.8553297719871691 key: train_precision value: [0.88446215 0.87649402 0.86538462 0.884 0.86446886 0.88537549 0.87307692 0.88142292 0.87258687 0.87548638] mean value: 0.876275825111137 key: test_recall value: [0.86206897 0.89655172 0.86206897 0.79310345 0.72413793 0.92857143 0.96428571 0.92857143 0.89285714 0.78571429] mean value: 0.8637931034482759 key: train_recall value: [0.8671875 0.859375 0.87890625 0.86328125 0.921875 0.87159533 0.88326848 0.86770428 0.87937743 0.87548638] mean value: 0.8768056906614786 key: test_roc_auc value: [0.85960591 0.93041872 0.85960591 0.82512315 0.75492611 0.87807882 0.91317734 0.84359606 0.89470443 0.82389163] mean value: 0.8583128078817734 key: train_roc_auc value: [0.87717352 0.86937622 0.87135974 0.87522039 0.88895306 0.87915704 0.87718112 0.87525839 0.87523559 0.87524319] mean value: 0.8764158256322957 key: test_jcc value: [0.75757576 0.86666667 0.75757576 0.6969697 0.6 0.78787879 0.84375 0.74285714 0.80645161 0.6875 ] mean value: 0.7547225422427035 key: train_jcc value: [0.77894737 0.76655052 0.77319588 0.7754386 0.80546075 0.78321678 0.78275862 0.77700348 0.77931034 0.77854671] mean value: 0.7800429060559617 MCC on Blind test: 0.6 Accuracy on Blind test: 0.81 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00980592 0.01067448 0.01085329 0.01106048 0.01118255 0.0110743 0.01100373 0.01079249 0.01016951 0.0098362 ] mean value: 0.010645294189453125 key: score_time value: [0.01352572 0.01347709 0.01385856 0.01333332 0.01348615 0.01313925 0.01317501 0.01264691 0.01324797 0.01208138] mean value: 0.01319713592529297 key: test_mcc value: [0.65104858 0.7589669 0.57881773 0.553659 0.58076493 0.71921182 0.68736396 0.50862069 0.66268617 0.54377353] mean value: 0.624491332602681 key: train_mcc value: [0.75057007 0.74270775 0.75884232 0.75083654 0.7548331 0.75486659 0.77099303 0.73114227 0.77027873 0.75887891] mean value: 0.7543949299234101 key: test_accuracy value: [0.8245614 0.87719298 0.78947368 0.77192982 0.78947368 0.85964912 0.84210526 0.75438596 0.8245614 0.77192982] mean value: 0.8105263157894737 key: train_accuracy value: [0.87524366 0.87134503 0.8791423 0.87524366 0.87719298 0.87719298 0.88499025 0.86549708 0.88499025 0.8791423 ] mean value: 0.8769980506822612 key: test_fscore value: [0.82142857 0.87272727 0.79310345 0.75471698 0.78571429 0.85714286 0.83018868 0.75 0.8 0.76363636] mean value: 0.8028658459302571 key: train_fscore value: [0.87401575 0.87058824 0.87649402 0.87301587 0.87475149 0.87524752 0.88223553 0.86444008 0.88362919 0.87698413] mean value: 0.8751401821885225 key: test_precision value: [0.85185185 0.92307692 0.79310345 0.83333333 0.81481481 0.85714286 0.88 0.75 0.90909091 0.77777778] mean value: 0.8390191915364329 key: train_precision value: [0.88095238 0.87401575 0.89430894 0.88709677 0.89068826 0.89112903 0.9057377 0.87301587 0.896 0.89473684] mean value: 0.8887681557673401 key: test_recall value: [0.79310345 0.82758621 0.79310345 0.68965517 0.75862069 0.85714286 0.78571429 0.75 0.71428571 0.75 ] mean value: 0.7719211822660098 key: train_recall value: [0.8671875 0.8671875 0.859375 0.859375 0.859375 0.85992218 0.85992218 0.85603113 0.87159533 0.85992218] mean value: 0.861989299610895 key: test_roc_auc value: [0.82512315 0.87807882 0.78940887 0.77339901 0.79002463 0.85960591 0.841133 0.75431034 0.8226601 0.77155172] mean value: 0.8105295566502463 key: train_roc_auc value: [0.87522799 0.87133694 0.87910384 0.87521279 0.87715832 0.87722671 0.88503921 0.86551556 0.88501642 0.87917984] mean value: 0.8770017631322957 key: test_jcc value: [0.6969697 0.77419355 0.65714286 0.60606061 0.64705882 0.75 0.70967742 0.6 0.66666667 0.61764706] mean value: 0.6725416676934703 key: train_jcc value: [0.77622378 0.77083333 0.78014184 0.77464789 0.77738516 0.77816901 0.78928571 0.76124567 0.79151943 0.78091873] mean value: 0.778037056551816 MCC on Blind test: 0.28 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02196026 0.02291179 0.02152228 0.02228761 0.02168727 0.02183843 0.02566576 0.02143693 0.02469206 0.0216434 ] mean value: 0.02256457805633545 key: score_time value: [0.01259232 0.01201963 0.01186705 0.01171136 0.01183677 0.0119884 0.01219893 0.01177049 0.01200604 0.01177382] mean value: 0.011976480484008789 key: test_mcc value: [0.8953202 0.96547546 0.72064772 0.82880708 0.54377353 0.7589669 0.76689254 0.59358067 0.86189955 0.79778885] mean value: 0.7733152498837434 key: train_mcc value: [0.83278097 0.84616083 0.847201 0.86134265 0.8306883 0.83376616 0.83513583 0.85135684 0.82366838 0.85557912] mean value: 0.8417680092666877 key: test_accuracy value: [0.94736842 0.98245614 0.85964912 0.9122807 0.77192982 0.87719298 0.87719298 0.78947368 0.92982456 0.89473684] mean value: 0.8842105263157894 key: train_accuracy value: [0.91423002 0.92202729 0.92202729 0.92982456 0.9122807 0.91617934 0.91617934 0.92397661 0.91033138 0.92592593] mean value: 0.9192982456140351 key: test_fscore value: [0.94736842 0.98305085 0.86666667 0.91803279 0.77966102 0.88135593 0.8852459 0.80645161 0.93103448 0.9 ] mean value: 0.8898867668515904 key: train_fscore value: [0.91821561 0.9245283 0.92509363 0.93181818 0.91712707 0.91871456 0.91962617 0.9273743 0.9141791 0.92936803] mean value: 0.9226044961753141 key: test_precision value: [0.96428571 0.96666667 0.83870968 0.875 0.76666667 0.83870968 0.81818182 0.73529412 0.9 0.84375 ] mean value: 0.8547264338286634 key: train_precision value: [0.87588652 0.89416058 0.88848921 0.90441176 0.86759582 0.89338235 0.88489209 0.88928571 0.8781362 0.88967972] mean value: 0.886591997049577 key: test_recall value: [0.93103448 1. 0.89655172 0.96551724 0.79310345 0.92857143 0.96428571 0.89285714 0.96428571 0.96428571] mean value: 0.9300492610837439 key: train_recall value: [0.96484375 0.95703125 0.96484375 0.9609375 0.97265625 0.94552529 0.95719844 0.9688716 0.95330739 0.97276265] mean value: 0.9617977869649805 key: test_roc_auc value: [0.9476601 0.98214286 0.85899015 0.91133005 0.77155172 0.87807882 0.87869458 0.79125616 0.93041872 0.89593596] mean value: 0.8846059113300493 key: train_roc_auc value: [0.91432849 0.92209539 0.92211059 0.92988509 0.91239816 0.91612202 0.91609922 0.92388892 0.91024745 0.92583445] mean value: 0.9193009788424125 key: test_jcc value: [0.9 0.96666667 0.76470588 0.84848485 0.63888889 0.78787879 0.79411765 0.67567568 0.87096774 0.81818182] mean value: 0.8065567957123935 key: train_jcc value: [0.84879725 0.85964912 0.86062718 0.87234043 0.84693878 0.84965035 0.85121107 0.86458333 0.8419244 0.86805556] mean value: 0.856377746223762 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.87534595 1.95227742 1.74219894 2.32791281 2.08549452 2.0482564 2.05750775 2.10308409 2.06708074 1.92465091] mean value: 2.018380951881409 key: score_time value: [0.01258206 0.01540279 0.02229261 0.01906872 0.01492906 0.01484323 0.01455307 0.01488876 0.01322269 0.01295877] mean value: 0.015474176406860352 key: test_mcc value: [0.8951918 0.8951918 0.8615634 0.82942474 0.72133224 0.86189955 0.79161589 0.85960591 0.78940887 0.82512315] mean value: 0.8330357351831268 key: train_mcc value: [0.98831147 0.99610895 0.99610895 0.99610895 0.99610895 0.9922027 0.99610889 0.99610895 0.9922027 0.9922027 ] mean value: 0.9941573207007584 key: test_accuracy value: [0.94736842 0.94736842 0.92982456 0.9122807 0.85964912 0.92982456 0.89473684 0.92982456 0.89473684 0.9122807 ] mean value: 0.9157894736842105 key: train_accuracy value: [0.99415205 0.99805068 0.99805068 0.99805068 0.99805068 0.99610136 0.99805068 0.99805068 0.99610136 0.99610136] mean value: 0.9970760233918128 key: test_fscore value: [0.94915254 0.94915254 0.93333333 0.90909091 0.85714286 0.93103448 0.89655172 0.92857143 0.89285714 0.9122807 ] mean value: 0.9159167664392371 key: train_fscore value: [0.99412916 0.99805068 0.99805068 0.99805068 0.99805068 0.99610895 0.99805825 0.99805068 0.99610895 0.99610895] mean value: 0.9970767670494975 key: test_precision value: [0.93333333 0.93333333 0.90322581 0.96153846 0.88888889 0.9 0.86666667 0.92857143 0.89285714 0.89655172] mean value: 0.91049667857788 key: train_precision value: [0.99607843 0.99610895 0.99610895 0.99610895 0.99610895 0.99610895 0.99612403 1. 0.99610895 0.99610895] mean value: 0.9964965108294698 key: test_recall value: [0.96551724 0.96551724 0.96551724 0.86206897 0.82758621 0.96428571 0.92857143 0.92857143 0.89285714 0.92857143] mean value: 0.9229064039408867 key: train_recall value: [0.9921875 1. 1. 1. 1. 0.99610895 1. 0.99610895 0.99610895 0.99610895] mean value: 0.997662329766537 key: test_roc_auc value: [0.94704433 0.94704433 0.92918719 0.91317734 0.86022167 0.93041872 0.8953202 0.92980296 0.89470443 0.91256158] mean value: 0.915948275862069 key: train_roc_auc value: [0.99414822 0.99805447 0.99805447 0.99805447 0.99805447 0.99610135 0.99804688 0.99805447 0.99610135 0.99610135] mean value: 0.997077152237354 key: test_jcc value: [0.90322581 0.90322581 0.875 0.83333333 0.75 0.87096774 0.8125 0.86666667 0.80645161 0.83870968] mean value: 0.846008064516129 key: train_jcc value: [0.98832685 0.99610895 0.99610895 0.99610895 0.99610895 0.99224806 0.99612403 0.99610895 0.99224806 0.99224806] mean value: 0.9941739812385003 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03231049 0.02459979 0.0224762 0.02270222 0.02196383 0.02153134 0.02374744 0.02357459 0.02356982 0.02394891] mean value: 0.024042463302612303 key: score_time value: [0.01236773 0.00948834 0.00888968 0.00900292 0.00900364 0.00897312 0.00892997 0.00917411 0.00890827 0.00901175] mean value: 0.00937495231628418 key: test_mcc value: [0.96551724 0.93202124 0.92980296 0.92980296 0.8953202 0.85960591 0.96551724 0.89952865 0.92980296 0.96547546] mean value: 0.9272394802485838 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.96491228 0.96491228 0.96491228 0.94736842 0.92982456 0.98245614 0.94736842 0.96491228 0.98245614] mean value: 0.9631578947368421 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.96666667 0.96551724 0.96551724 0.94736842 0.92857143 0.98245614 0.94339623 0.96428571 0.98181818] mean value: 0.9628053402270093 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93548387 0.96551724 0.96551724 0.96428571 0.92857143 0.96551724 1. 0.96428571 1. ] mean value: 0.968917845224853 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 1. 0.96551724 0.96551724 0.93103448 0.92857143 1. 0.89285714 0.96428571 0.96428571] mean value: 0.9577586206896552 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.96428571 0.96490148 0.96490148 0.9476601 0.92980296 0.98275862 0.94642857 0.96490148 0.98214286] mean value: 0.9630541871921183 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.93548387 0.93333333 0.93333333 0.9 0.86666667 0.96551724 0.89285714 0.93103448 0.96428571] mean value: 0.9288029026961174 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12038136 0.12122679 0.119277 0.12026787 0.12030101 0.12140536 0.12146163 0.11938405 0.12679267 0.11983061] mean value: 0.12103283405303955 key: score_time value: [0.01790071 0.01808691 0.01780415 0.01798964 0.01912498 0.01788235 0.01818609 0.01802516 0.01791906 0.01787663] mean value: 0.018079566955566406 key: test_mcc value: [0.8951918 0.8953202 0.8615634 0.82490815 0.68850906 0.89988258 0.82942474 0.78940887 0.9321832 0.89988258] mean value: 0.851627457182217 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.94736842 0.92982456 0.9122807 0.84210526 0.94736842 0.9122807 0.89473684 0.96491228 0.94736842] mean value: 0.9245614035087719 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94915254 0.94736842 0.93333333 0.91525424 0.83636364 0.94915254 0.91525424 0.89285714 0.96551724 0.94915254] mean value: 0.925340587668097 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93333333 0.96428571 0.90322581 0.9 0.88461538 0.90322581 0.87096774 0.89285714 0.93333333 0.90322581] mean value: 0.908907006971523 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.93103448 0.96551724 0.93103448 0.79310345 1. 0.96428571 0.89285714 1. 1. ] mean value: 0.9443349753694581 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94704433 0.9476601 0.92918719 0.91194581 0.8429803 0.94827586 0.91317734 0.89470443 0.96551724 0.94827586] mean value: 0.9248768472906403 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90322581 0.9 0.875 0.84375 0.71875 0.90322581 0.84375 0.80645161 0.93333333 0.90322581] mean value: 0.8630712365591398 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.75 Accuracy on Blind test: 0.89 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01026869 0.01040196 0.01036501 0.01076531 0.01043916 0.01040483 0.01052999 0.01043129 0.01045632 0.01034832] mean value: 0.010441088676452636 key: score_time value: [0.00883389 0.00883102 0.00883794 0.00973797 0.00897551 0.00892496 0.00891161 0.0089035 0.00878906 0.00879788] mean value: 0.008954334259033202 key: test_mcc value: [0.54592083 0.54433498 0.58076493 0.54592083 0.30745722 0.33621986 0.71921182 0.26802813 0.72133224 0.54592083] mean value: 0.5115111682505243 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.77192982 0.77192982 0.78947368 0.77192982 0.64912281 0.66666667 0.85964912 0.63157895 0.85964912 0.77192982] mean value: 0.7543859649122807 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.78688525 0.77192982 0.78571429 0.78688525 0.61538462 0.6779661 0.85714286 0.57142857 0.86206897 0.75471698] mean value: 0.7470122694379244 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.78571429 0.81481481 0.75 0.69565217 0.64516129 0.85714286 0.66666667 0.83333333 0.8 ] mean value: 0.7598485421907581 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.82758621 0.75862069 0.75862069 0.82758621 0.55172414 0.71428571 0.85714286 0.5 0.89285714 0.71428571] mean value: 0.7402709359605911 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77093596 0.77216749 0.79002463 0.77093596 0.65086207 0.66748768 0.85960591 0.62931034 0.86022167 0.77093596] mean value: 0.7542487684729063 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64864865 0.62857143 0.64705882 0.64864865 0.44444444 0.51282051 0.75 0.4 0.75757576 0.60606061] mean value: 0.6043828870299459 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.86249709 1.87444353 1.87393451 1.88897467 1.92013979 1.87955761 1.87557983 1.8870728 1.8653574 1.87198281] mean value: 1.8799540042877196 key: score_time value: [0.09142756 0.09387064 0.09172773 0.09915471 0.09149933 0.09835124 0.0914166 0.09191871 0.09128451 0.0958066 ] mean value: 0.0936457633972168 key: test_mcc value: [1. 0.93202124 0.93202124 0.8951918 0.8951918 0.89988258 0.9321832 0.92980296 0.96551724 0.96551724] mean value: 0.9347329303798759 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.96491228 0.96491228 0.94736842 0.94736842 0.94736842 0.96491228 0.96491228 0.98245614 0.98245614] mean value: 0.9666666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.96666667 0.96666667 0.94915254 0.94915254 0.94915254 0.96551724 0.96428571 0.98245614 0.98245614] mean value: 0.9675506196818756 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93548387 0.93548387 0.93333333 0.93333333 0.90322581 0.93333333 0.96428571 0.96551724 0.96551724] mean value: 0.9469513745431432 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.96551724 0.96551724 1. 1. 0.96428571 1. 1. ] mean value: 0.9895320197044335 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.96428571 0.96428571 0.94704433 0.94704433 0.94827586 0.96551724 0.96490148 0.98275862 0.98275862] mean value: 0.9666871921182266 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.93548387 0.93548387 0.90322581 0.90322581 0.90322581 0.93333333 0.93103448 0.96551724 0.96551724] mean value: 0.9376047460140897 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.07170582 0.97667527 1.01005721 0.98215485 0.97630405 0.96141648 0.98975873 1.01708627 0.98110008 1.01926374] mean value: 0.9985522508621216 key: score_time value: [0.24334216 0.19834042 0.2711153 0.20498848 0.18867207 0.28040957 0.27722764 0.26618004 0.23420763 0.25020599] mean value: 0.2414689302444458 key: test_mcc value: [0.96547546 0.96547546 0.93202124 0.8615634 0.8615634 0.86851042 0.9321832 0.8953202 0.96551724 0.96551724] mean value: 0.9213147255333994 key: train_mcc value: [0.9652735 0.96907736 0.97289533 0.96907736 0.965509 0.98057338 0.96526984 0.96907457 0.96526984 0.96907457] mean value: 0.9691094739529907 key: test_accuracy value: [0.98245614 0.98245614 0.96491228 0.92982456 0.92982456 0.92982456 0.96491228 0.94736842 0.98245614 0.98245614] mean value: 0.9596491228070175 key: train_accuracy value: [0.98245614 0.98440546 0.98635478 0.98440546 0.98245614 0.99025341 0.98245614 0.98440546 0.98245614 0.98440546] mean value: 0.9844054580896686 key: test_fscore value: [0.98305085 0.98305085 0.96666667 0.93333333 0.93333333 0.93333333 0.96551724 0.94736842 0.98245614 0.98245614] mean value: 0.9610566304715618 key: train_fscore value: [0.98265896 0.98455598 0.98646035 0.98455598 0.98272553 0.99032882 0.98272553 0.98461538 0.98272553 0.98461538] mean value: 0.9845967449652123 key: test_precision value: [0.96666667 0.96666667 0.93548387 0.90322581 0.90322581 0.875 0.93333333 0.93103448 0.96551724 0.96551724] mean value: 0.9345671116054876 key: train_precision value: [0.96958175 0.97328244 0.97701149 0.97328244 0.96603774 0.98461538 0.96969697 0.97338403 0.96969697 0.97338403] mean value: 0.9729973249493369 key: test_recall value: [1. 1. 1. 0.96551724 0.96551724 1. 1. 0.96428571 1. 1. ] mean value: 0.9895320197044335 key: train_recall value: [0.99609375 0.99609375 0.99609375 0.99609375 1. 0.99610895 0.99610895 0.99610895 0.99610895 0.99610895] mean value: 0.9964919747081712 key: test_roc_auc value: [0.98214286 0.98214286 0.96428571 0.92918719 0.92918719 0.93103448 0.96551724 0.9476601 0.98275862 0.98275862] mean value: 0.9596674876847291 key: train_roc_auc value: [0.98248267 0.9844282 0.98637372 0.9844282 0.98249027 0.99024197 0.98242947 0.9843826 0.98242947 0.9843826 ] mean value: 0.9844069187743191 key: test_jcc value: [0.96666667 0.96666667 0.93548387 0.875 0.875 0.875 0.93333333 0.9 0.96551724 0.96551724] mean value: 0.9258185020393029 key: train_jcc value: [0.96590909 0.96958175 0.97328244 0.96958175 0.96603774 0.98084291 0.96603774 0.96969697 0.96603774 0.96969697] mean value: 0.9696705090574546 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02531004 0.01198626 0.01208639 0.01204205 0.01192236 0.01199722 0.01211691 0.01198721 0.01262665 0.0113945 ] mean value: 0.01334695816040039 key: score_time value: [0.01192689 0.00937462 0.01031423 0.00997972 0.01006103 0.01001763 0.01013184 0.01009941 0.00948906 0.00999475] mean value: 0.010138916969299316 key: test_mcc value: [0.71921182 0.86189955 0.71921182 0.65104858 0.51048128 0.7589669 0.82942474 0.69581469 0.78940887 0.65018988] mean value: 0.7185658130544192 key: train_mcc value: [0.75451908 0.73892092 0.74278722 0.75068043 0.77951916 0.75838325 0.75443104 0.75058523 0.7505054 0.75048638] mean value: 0.7530818106690766 key: test_accuracy value: [0.85964912 0.92982456 0.85964912 0.8245614 0.75438596 0.87719298 0.9122807 0.84210526 0.89473684 0.8245614 ] mean value: 0.8578947368421053 key: train_accuracy value: [0.87719298 0.86939571 0.87134503 0.87524366 0.88888889 0.8791423 0.87719298 0.87524366 0.87524366 0.87524366] mean value: 0.8764132553606238 key: test_fscore value: [0.86206897 0.92857143 0.86206897 0.82142857 0.75 0.88135593 0.91525424 0.85245902 0.89285714 0.81481481] mean value: 0.8580879074591409 key: train_fscore value: [0.87573964 0.8678501 0.87209302 0.87351779 0.89224953 0.87843137 0.87814313 0.8745098 0.87596899 0.87548638] mean value: 0.8763989764320921 key: test_precision value: [0.86206897 0.96296296 0.86206897 0.85185185 0.77777778 0.83870968 0.87096774 0.78787879 0.89285714 0.84615385] mean value: 0.8553297719871691 key: train_precision value: [0.88446215 0.87649402 0.86538462 0.884 0.86446886 0.88537549 0.87307692 0.88142292 0.87258687 0.87548638] mean value: 0.876275825111137 key: test_recall value: [0.86206897 0.89655172 0.86206897 0.79310345 0.72413793 0.92857143 0.96428571 0.92857143 0.89285714 0.78571429] mean value: 0.8637931034482759 key: train_recall value: [0.8671875 0.859375 0.87890625 0.86328125 0.921875 0.87159533 0.88326848 0.86770428 0.87937743 0.87548638] mean value: 0.8768056906614786 key: test_roc_auc value: [0.85960591 0.93041872 0.85960591 0.82512315 0.75492611 0.87807882 0.91317734 0.84359606 0.89470443 0.82389163] mean value: 0.8583128078817734 key: train_roc_auc value: [0.87717352 0.86937622 0.87135974 0.87522039 0.88895306 0.87915704 0.87718112 0.87525839 0.87523559 0.87524319] mean value: 0.8764158256322957 key: test_jcc value: [0.75757576 0.86666667 0.75757576 0.6969697 0.6 0.78787879 0.84375 0.74285714 0.80645161 0.6875 ] mean value: 0.7547225422427035 key: train_jcc value: [0.77894737 0.76655052 0.77319588 0.7754386 0.80546075 0.78321678 0.78275862 0.77700348 0.77931034 0.77854671] mean value: 0.7800429060559617 MCC on Blind test: 0.6 Accuracy on Blind test: 0.81 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.10502219 0.07858324 0.08344984 0.08857894 0.08617067 0.090137 0.08256125 0.0888052 0.08122015 0.08864975] mean value: 0.0873178243637085 key: score_time value: [0.01173377 0.01130414 0.01159191 0.01140666 0.01149297 0.01123619 0.01140857 0.01112556 0.01298761 0.01339841] mean value: 0.011768579483032227 key: test_mcc value: [0.96551724 0.96547546 0.93202124 0.92980296 0.96551724 0.8953202 0.96551724 0.96551724 0.96551724 1. ] mean value: 0.9550206057552516 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 0.96491228 0.96491228 0.98245614 0.94736842 0.98245614 0.98245614 0.98245614 1. ] mean value: 0.9771929824561403 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.98305085 0.96666667 0.96551724 0.98245614 0.94736842 0.98245614 0.98245614 0.98245614 1. ] mean value: 0.9774883878310622 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96666667 0.93548387 0.96551724 1. 0.93103448 0.96551724 0.96551724 0.96551724 1. ] mean value: 0.969525398591027 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 1. 1. 0.96551724 0.96551724 0.96428571 1. 1. 1. 1. ] mean value: 0.9860837438423645 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.98214286 0.96428571 0.96490148 0.98275862 0.9476601 0.98275862 0.98275862 0.98275862 1. ] mean value: 0.9772783251231528 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.96666667 0.93548387 0.93333333 0.96551724 0.9 0.96551724 0.96551724 0.96551724 1. ] mean value: 0.9563070077864294 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04534984 0.05460882 0.04289627 0.07200694 0.07619452 0.06920385 0.04467583 0.07594752 0.07032585 0.07389331] mean value: 0.06251027584075927 key: score_time value: [0.021312 0.01302409 0.01268458 0.01460743 0.02094841 0.01261997 0.01289773 0.01310945 0.01993465 0.01473045] mean value: 0.01558687686920166 key: test_mcc value: [0.89988258 0.9321832 0.85960591 0.8953202 0.86189955 0.86189955 0.96547546 0.82880708 1. 0.85960591] mean value: 0.8964679428369845 key: train_mcc value: [0.97277661 0.97672758 0.98051435 0.98057426 0.98057426 0.97663743 0.97277537 0.97663743 0.97277537 0.97277537] mean value: 0.9762768016203472 key: test_accuracy value: [0.94736842 0.96491228 0.92982456 0.94736842 0.92982456 0.92982456 0.98245614 0.9122807 1. 0.92982456] mean value: 0.9473684210526315 key: train_accuracy value: [0.98635478 0.98830409 0.99025341 0.99025341 0.99025341 0.98830409 0.98635478 0.98830409 0.98635478 0.98635478] mean value: 0.9881091617933723 key: test_fscore value: [0.94545455 0.96428571 0.93103448 0.94736842 0.92857143 0.93103448 0.98181818 0.90566038 1. 0.92857143] mean value: 0.9463799062629662 key: train_fscore value: [0.98640777 0.98837209 0.99025341 0.99029126 0.99029126 0.98837209 0.98646035 0.98837209 0.98646035 0.98646035] mean value: 0.9881741026125374 key: test_precision value: [1. 1. 0.93103448 0.96428571 0.96296296 0.9 1. 0.96 1. 0.92857143] mean value: 0.9646854588578726 key: train_precision value: [0.98069498 0.98076923 0.98832685 0.98455598 0.98455598 0.98455598 0.98076923 0.98455598 0.98076923 0.98076923] mean value: 0.9830322690244869 key: test_recall value: [0.89655172 0.93103448 0.93103448 0.93103448 0.89655172 0.96428571 0.96428571 0.85714286 1. 0.92857143] mean value: 0.9300492610837439 key: train_recall value: [0.9921875 0.99609375 0.9921875 0.99609375 0.99609375 0.9922179 0.9922179 0.9922179 0.9922179 0.9922179 ] mean value: 0.9933745744163425 key: test_roc_auc value: [0.94827586 0.96551724 0.92980296 0.9476601 0.93041872 0.93041872 0.98214286 0.91133005 1. 0.92980296] mean value: 0.9475369458128079 key: train_roc_auc value: [0.98636612 0.98831925 0.99025717 0.99026477 0.99026477 0.98829645 0.98634332 0.98829645 0.98634332 0.98634332] mean value: 0.9881094965953308 key: test_jcc value: [0.89655172 0.93103448 0.87096774 0.9 0.86666667 0.87096774 0.96428571 0.82758621 1. 0.86666667] mean value: 0.8994726945283119 key: train_jcc value: [0.97318008 0.97701149 0.98069498 0.98076923 0.98076923 0.97701149 0.97328244 0.97701149 0.97328244 0.97328244] mean value: 0.976629532986469 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02329779 0.01084447 0.01144338 0.01139617 0.01017046 0.0110743 0.01069474 0.01005888 0.01109362 0.01126432] mean value: 0.01213381290435791 key: score_time value: [0.0097506 0.00933862 0.00944877 0.00938034 0.00893426 0.00938964 0.00931454 0.00969791 0.0094347 0.00964665] mean value: 0.009433603286743164 key: test_mcc value: [0.86189955 0.96547546 0.68434084 0.75462449 0.47413793 0.7589669 0.7366424 0.68850906 0.82512315 0.75492611] mean value: 0.7504645885346989 key: train_mcc value: [0.73976678 0.72388482 0.76290396 0.75133166 0.78043156 0.74303497 0.75884232 0.76719997 0.73629377 0.75505926] mean value: 0.7518749069827675 key: test_accuracy value: [0.92982456 0.98245614 0.84210526 0.87719298 0.73684211 0.87719298 0.85964912 0.84210526 0.9122807 0.87719298] mean value: 0.8736842105263157 key: train_accuracy value: [0.86939571 0.86159844 0.88109162 0.87524366 0.88888889 0.87134503 0.8791423 0.88304094 0.86744639 0.87719298] mean value: 0.875438596491228 key: test_fscore value: [0.92857143 0.98305085 0.84745763 0.88135593 0.73684211 0.88135593 0.87096774 0.84745763 0.9122807 0.87719298] mean value: 0.8766532926082291 key: train_fscore value: [0.87238095 0.86424474 0.8833652 0.8778626 0.89305816 0.87356322 0.88167939 0.88636364 0.87169811 0.88 ] mean value: 0.8784216009065232 key: test_precision value: [0.96296296 0.96666667 0.83333333 0.86666667 0.75 0.83870968 0.79411765 0.80645161 0.89655172 0.86206897] mean value: 0.8577529256666206 key: train_precision value: [0.85130112 0.84644195 0.86516854 0.85820896 0.85920578 0.86037736 0.86516854 0.86346863 0.84615385 0.8619403 ] mean value: 0.8577435010694252 key: test_recall value: [0.89655172 1. 0.86206897 0.89655172 0.72413793 0.92857143 0.96428571 0.89285714 0.92857143 0.89285714] mean value: 0.8986453201970444 key: train_recall value: [0.89453125 0.8828125 0.90234375 0.8984375 0.9296875 0.88715953 0.89883268 0.91050584 0.89883268 0.89883268] mean value: 0.9001975924124513 key: test_roc_auc value: [0.93041872 0.98214286 0.84174877 0.87684729 0.73706897 0.87807882 0.8614532 0.8429803 0.91256158 0.87746305] mean value: 0.8740763546798029 key: train_roc_auc value: [0.86944461 0.86163971 0.88113296 0.87528879 0.88896826 0.87131414 0.87910384 0.88298729 0.86738509 0.87715072] mean value: 0.8754415430447471 key: test_jcc value: [0.86666667 0.96666667 0.73529412 0.78787879 0.58333333 0.78787879 0.77142857 0.73529412 0.83870968 0.78125 ] mean value: 0.7854400726566286 key: train_jcc value: [0.77364865 0.76094276 0.79109589 0.78231293 0.80677966 0.7755102 0.7883959 0.79591837 0.77257525 0.78571429] mean value: 0.7832893898605223 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02009606 0.02367425 0.02581382 0.02568173 0.02375722 0.02464199 0.02251601 0.02397561 0.02633715 0.02839375] mean value: 0.02448875904083252 key: score_time value: [0.01082706 0.0116818 0.01224518 0.01217031 0.01221275 0.01211119 0.01224017 0.01211429 0.01587439 0.01224256] mean value: 0.012371969223022462 key: test_mcc value: [0.89988258 0.92980296 0.55317854 0.92980296 0.76689254 0.86189955 0.76689254 0.92980296 0.92980296 0.96547546] mean value: 0.8533433009431273 key: train_mcc value: [0.93120523 0.9652735 0.6841678 0.95324446 0.96491921 0.97663814 0.93901501 0.95324588 0.97289329 0.97271663] mean value: 0.9313319164525132 key: test_accuracy value: [0.94736842 0.96491228 0.73684211 0.96491228 0.87719298 0.92982456 0.87719298 0.96491228 0.96491228 0.98245614] mean value: 0.9210526315789473 key: train_accuracy value: [0.96491228 0.98245614 0.81871345 0.97660819 0.98245614 0.98830409 0.96881092 0.97660819 0.98635478 0.98635478] mean value: 0.9631578947368421 key: test_fscore value: [0.94545455 0.96551724 0.79452055 0.96551724 0.86792453 0.93103448 0.8852459 0.96428571 0.96428571 0.98181818] mean value: 0.9265604099247834 key: train_fscore value: [0.96385542 0.98265896 0.84628099 0.97647059 0.98238748 0.98828125 0.96969697 0.9765625 0.98651252 0.98640777] mean value: 0.965911444750535 key: test_precision value: [1. 0.96551724 0.65909091 0.96551724 0.95833333 0.9 0.81818182 0.96428571 0.96428571 1. ] mean value: 0.919521197193611 key: train_precision value: [0.99173554 0.96958175 0.73352436 0.98031496 0.98431373 0.99215686 0.94464945 0.98039216 0.97709924 0.98449612] mean value: 0.9538264154435027 key: test_recall value: [0.89655172 0.96551724 1. 0.96551724 0.79310345 0.96428571 0.96428571 0.96428571 0.96428571 0.96428571] mean value: 0.9442118226600985 key: train_recall value: [0.9375 0.99609375 1. 0.97265625 0.98046875 0.9844358 0.99610895 0.97276265 0.99610895 0.98832685] mean value: 0.9824461940661479 key: test_roc_auc value: [0.94827586 0.96490148 0.73214286 0.96490148 0.87869458 0.93041872 0.87869458 0.96490148 0.96490148 0.98214286] mean value: 0.9209975369458129 key: train_roc_auc value: [0.96485895 0.98248267 0.81906615 0.9766005 0.98245227 0.98831165 0.9687576 0.9766157 0.98633572 0.98635092] mean value: 0.9631832137645915 key: test_jcc value: [0.89655172 0.93333333 0.65909091 0.93333333 0.76666667 0.87096774 0.79411765 0.93103448 0.93103448 0.96428571] mean value: 0.8680416035359436 key: train_jcc value: [0.93023256 0.96590909 0.73352436 0.95402299 0.96538462 0.97683398 0.94117647 0.95419847 0.97338403 0.97318008] mean value: 0.9367846635991106 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01736093 0.02417779 0.01855564 0.02388573 0.02126384 0.02123237 0.02325678 0.02157664 0.0230937 0.01871109] mean value: 0.021311450004577636 key: score_time value: [0.01171088 0.01253772 0.01214719 0.01223183 0.01217747 0.01221824 0.01223803 0.01537561 0.0122745 0.01589489] mean value: 0.012880635261535645 key: test_mcc value: [0.93202124 0.92980296 0.7589669 0.8615634 0.75047877 0.89988258 0.86789789 0.86851042 0.8953202 0.89988258] mean value: 0.8664326936672224 key: train_mcc value: [0.92383514 0.9652735 0.91600355 0.96509685 0.83295731 0.96147894 0.96497895 0.9496686 0.93531646 0.93864387] mean value: 0.9353253163476655 key: test_accuracy value: [0.96491228 0.96491228 0.87719298 0.92982456 0.85964912 0.94736842 0.92982456 0.92982456 0.94736842 0.94736842] mean value: 0.9298245614035088 key: train_accuracy value: [0.96101365 0.98245614 0.95711501 0.98245614 0.91033138 0.98050682 0.98245614 0.97465887 0.9668616 0.96881092] mean value: 0.9666666666666667 key: test_fscore value: [0.96666667 0.96551724 0.87272727 0.93333333 0.84 0.94915254 0.92307692 0.93333333 0.94736842 0.94915254] mean value: 0.9280328276315234 key: train_fscore value: [0.96212121 0.98265896 0.95564516 0.98259188 0.9017094 0.98084291 0.98238748 0.97504798 0.96786389 0.96958175] mean value: 0.9660450626117191 key: test_precision value: [0.93548387 0.96551724 0.92307692 0.90322581 1. 0.90322581 1. 0.875 0.93103448 0.90322581] mean value: 0.9339789937537435 key: train_precision value: [0.93382353 0.96958175 0.9875 0.97318008 0.99528302 0.96603774 0.98818898 0.96212121 0.94117647 0.94795539] mean value: 0.9664848159228501 key: test_recall value: [1. 0.96551724 0.82758621 0.96551724 0.72413793 1. 0.85714286 1. 0.96428571 1. ] mean value: 0.9304187192118226 key: train_recall value: [0.9921875 0.99609375 0.92578125 0.9921875 0.82421875 0.99610895 0.9766537 0.98832685 0.99610895 0.9922179 ] mean value: 0.9679885092412451 key: test_roc_auc value: [0.96428571 0.96490148 0.87807882 0.92918719 0.86206897 0.94827586 0.92857143 0.93103448 0.9476601 0.94827586] mean value: 0.9302339901477833 key: train_roc_auc value: [0.96107429 0.98248267 0.95705405 0.98247507 0.91016385 0.98047635 0.98246747 0.97463217 0.96680447 0.9687652 ] mean value: 0.966639561040856 key: test_jcc value: [0.93548387 0.93333333 0.77419355 0.875 0.72413793 0.90322581 0.85714286 0.875 0.9 0.90322581] mean value: 0.8680743153768737 key: train_jcc value: [0.9270073 0.96590909 0.91505792 0.96577947 0.82101167 0.96240602 0.96538462 0.95131086 0.93772894 0.94095941] mean value: 0.9352555285237902 MCC on Blind test: 0.88 Accuracy on Blind test: 0.94 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.22136092 0.2146697 0.22013402 0.21376967 0.21607924 0.21981454 0.21257806 0.2166841 0.21159649 0.21193361] mean value: 0.21586203575134277 key: score_time value: [0.01560521 0.01587868 0.01571321 0.01564407 0.01641417 0.01539636 0.01562452 0.01541305 0.01549864 0.01545691] mean value: 0.01566448211669922 key: test_mcc value: [0.96551724 0.96551724 0.93202124 0.96547546 0.96551724 0.86851042 0.96551724 0.93202124 1. 0.92980296] mean value: 0.9489900277779855 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 0.96491228 0.98245614 0.98245614 0.92982456 0.98245614 0.96491228 1. 0.96491228] mean value: 0.9736842105263157 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.98245614 0.96666667 0.98305085 0.98245614 0.93333333 0.98245614 0.96296296 1. 0.96428571] mean value: 0.9740124086109813 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.93548387 0.96666667 1. 0.875 0.96551724 1. 1. 0.96428571] mean value: 0.9706953493299433 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.96551724 1. 1. 0.96551724 1. 1. 0.92857143 1. 0.96428571] mean value: 0.9789408866995074 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.98275862 0.96428571 0.98214286 0.98275862 0.93103448 0.98275862 0.96428571 1. 0.96490148] mean value: 0.973768472906404 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.96551724 0.93548387 0.96666667 0.96551724 0.875 0.96551724 0.92857143 1. 0.93103448] mean value: 0.94988254144817 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.07275224 0.07551146 0.07736611 0.08631492 0.08354521 0.08567595 0.09274006 0.08661175 0.08095288 0.07556176] mean value: 0.08170323371887207 key: score_time value: [0.02348256 0.03980899 0.02562881 0.04097319 0.04085207 0.04124117 0.03203225 0.03526425 0.02682853 0.02724028] mean value: 0.033335208892822266 key: test_mcc value: [0.96551724 0.96551724 0.93202124 0.92980296 0.8953202 0.85960591 0.92980296 0.96547546 0.96551724 0.96547546] mean value: 0.9374055898309153 key: train_mcc value: [0.9922027 0.9922027 1. 0.9922027 0.99223298 0.99223298 0.9922027 0.99610895 0.98831165 0.99610895] mean value: 0.9933806305949693 key: test_accuracy value: [0.98245614 0.98245614 0.96491228 0.96491228 0.94736842 0.92982456 0.96491228 0.98245614 0.98245614 0.98245614] mean value: 0.968421052631579 key: train_accuracy value: [0.99610136 0.99610136 1. 0.99610136 0.99610136 0.99610136 0.99610136 0.99805068 0.99415205 0.99805068] mean value: 0.9966861598440546 key: test_fscore value: [0.98245614 0.98245614 0.96666667 0.96551724 0.94736842 0.92857143 0.96428571 0.98181818 0.98245614 0.98181818] mean value: 0.9683414256644747 key: train_fscore value: [0.99609375 0.99609375 1. 0.99609375 0.99610895 0.99609375 0.99610895 0.99805068 0.99415205 0.99805068] mean value: 0.9966846310138728 key: test_precision value: [1. 1. 0.93548387 0.96551724 0.96428571 0.92857143 0.96428571 1. 0.96551724 1. ] mean value: 0.972366121086922 key: train_precision value: [0.99609375 0.99609375 1. 0.99609375 0.99224806 1. 0.99610895 1. 0.99609375 1. ] mean value: 0.9972732011431846 key: test_recall value: [0.96551724 0.96551724 1. 0.96551724 0.93103448 0.92857143 0.96428571 0.96428571 1. 0.96428571] mean value: 0.9649014778325123 key: train_recall value: [0.99609375 0.99609375 1. 0.99609375 1. 0.9922179 0.99610895 0.99610895 0.9922179 0.99610895] mean value: 0.9961043895914397 key: test_roc_auc value: [0.98275862 0.98275862 0.96428571 0.96490148 0.9476601 0.92980296 0.96490148 0.98214286 0.98275862 0.98214286] mean value: 0.9684113300492612 key: train_roc_auc value: [0.99610135 0.99610135 1. 0.99610135 0.99610895 0.99610895 0.99610135 0.99805447 0.99415582 0.99805447] mean value: 0.9966888071498055 key: test_jcc value: [0.96551724 0.96551724 0.93548387 0.93333333 0.9 0.86666667 0.93103448 0.96428571 0.96551724 0.96428571] mean value: 0.9391641506435723 key: train_jcc value: [0.9922179 0.9922179 1. 0.9922179 0.99224806 0.9922179 0.99224806 0.99610895 0.98837209 0.99610895] mean value: 0.9933957711217688 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.22260284 0.25764489 0.23697352 0.23838496 0.2421689 0.22621894 0.18569469 0.18221235 0.21668482 0.20190597] mean value: 0.22104918956756592 key: score_time value: [0.02602887 0.0259223 0.02647781 0.02608204 0.0156517 0.02640224 0.03134131 0.01563168 0.01565242 0.02606297] mean value: 0.023525333404541014 key: test_mcc value: [0.71921182 0.7589669 0.65018988 0.68434084 0.64901478 0.75492611 0.75492611 0.61805122 0.85960591 0.61453202] mean value: 0.706376559482153 key: train_mcc value: [0.98443509 0.9766081 0.98831165 0.98051435 0.9844054 0.9766081 0.97277537 0.98051405 0.9766081 0.9766081 ] mean value: 0.9797388296638977 key: test_accuracy value: [0.85964912 0.87719298 0.8245614 0.84210526 0.8245614 0.87719298 0.87719298 0.80701754 0.92982456 0.80701754] mean value: 0.8526315789473684 key: train_accuracy value: [0.99220273 0.98830409 0.99415205 0.99025341 0.99220273 0.98830409 0.98635478 0.99025341 0.98830409 0.98830409] mean value: 0.9898635477582846 key: test_fscore value: [0.86206897 0.87272727 0.83333333 0.84745763 0.82758621 0.87719298 0.87719298 0.81355932 0.92857143 0.80701754] mean value: 0.85467076649703 key: train_fscore value: [0.99215686 0.98828125 0.99415205 0.99025341 0.9921875 0.98832685 0.98646035 0.99029126 0.98832685 0.98832685] mean value: 0.9898763225880247 key: test_precision value: [0.86206897 0.92307692 0.80645161 0.83333333 0.82758621 0.86206897 0.86206897 0.77419355 0.92857143 0.79310345] mean value: 0.8472523397996146 key: train_precision value: [0.99606299 0.98828125 0.9922179 0.98832685 0.9921875 0.98832685 0.98076923 0.98837209 0.98832685 0.98832685] mean value: 0.9891198357747265 key: test_recall value: [0.86206897 0.82758621 0.86206897 0.86206897 0.82758621 0.89285714 0.89285714 0.85714286 0.92857143 0.82142857] mean value: 0.863423645320197 key: train_recall value: [0.98828125 0.98828125 0.99609375 0.9921875 0.9921875 0.98832685 0.9922179 0.9922179 0.98832685 0.98832685] mean value: 0.9906447592412452 key: test_roc_auc value: [0.85960591 0.87807882 0.82389163 0.84174877 0.82450739 0.87746305 0.87746305 0.80788177 0.92980296 0.80726601] mean value: 0.8527709359605911 key: train_roc_auc value: [0.9921951 0.98830405 0.99415582 0.99025717 0.9922027 0.98830405 0.98634332 0.99024957 0.98830405 0.98830405] mean value: 0.9898619892996109 key: test_jcc value: [0.75757576 0.77419355 0.71428571 0.73529412 0.70588235 0.78125 0.78125 0.68571429 0.86666667 0.67647059] mean value: 0.7478583031453051 key: train_jcc value: [0.9844358 0.97683398 0.98837209 0.98069498 0.98449612 0.97692308 0.97328244 0.98076923 0.97692308 0.97692308] mean value: 0.9799653876535144 MCC on Blind test: 0.45 Accuracy on Blind test: 0.75 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.86680079 0.85779858 0.87546086 0.87063336 0.85156989 0.8605125 0.87048817 0.85345745 0.84471488 0.86324358] mean value: 0.8614680051803589 key: score_time value: [0.00975132 0.01042104 0.00950241 0.00965095 0.00974298 0.00994873 0.00963378 0.0094924 0.01010132 0.00952649] mean value: 0.009777140617370606 key: test_mcc value: [1. 0.93202124 0.93202124 0.92980296 0.8951918 0.8953202 0.96551724 0.96547546 0.96551724 0.96547546] mean value: 0.9446342833958448 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.96491228 0.96491228 0.96491228 0.94736842 0.94736842 0.98245614 0.98245614 0.98245614 0.98245614] mean value: 0.9719298245614034 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.96666667 0.96666667 0.96551724 0.94915254 0.94736842 0.98245614 0.98181818 0.98245614 0.98181818] mean value: 0.9723920182476274 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93548387 0.93548387 0.96551724 0.93333333 0.93103448 0.96551724 1. 0.96551724 1. ] mean value: 0.9631887282165369 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.96551724 0.96551724 0.96428571 1. 0.96428571 1. 0.96428571] mean value: 0.9823891625615764 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.96428571 0.96428571 0.96490148 0.94704433 0.9476601 0.98275862 0.98214286 0.98275862 0.98214286] mean value: 0.9717980295566503 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.93548387 0.93548387 0.93333333 0.90322581 0.9 0.96551724 0.96428571 0.96551724 0.96428571] mean value: 0.9467132793050479 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03680491 0.03327942 0.03254247 0.03225303 0.03544617 0.03223276 0.03192067 0.03175664 0.03236866 0.03210115] mean value: 0.03307058811187744 key: score_time value: [0.0125246 0.01293564 0.01295924 0.0146606 0.01284409 0.01471877 0.01512527 0.0162437 0.01573086 0.02103305] mean value: 0.014877581596374511 key: test_mcc value: [0.6317806 0.58358651 0.622444 0.6166424 0.75462449 0.54759338 0.53222729 0.72706729 0.72064772 0.68850906] mean value: 0.6425122759141091 key: train_mcc value: [0.86827667 0.89478174 0.98443556 0.97687436 0.92846644 0.98452465 0.84554532 0.84765255 0.94314064 0.86288853] mean value: 0.9136586453351799 key: test_accuracy value: [0.80701754 0.78947368 0.80701754 0.80701754 0.87719298 0.77192982 0.75438596 0.85964912 0.85964912 0.84210526] mean value: 0.8175438596491228 key: train_accuracy value: [0.92982456 0.9454191 0.99220273 0.98830409 0.96296296 0.99220273 0.91812865 0.91812865 0.97076023 0.92787524] mean value: 0.9545808966861599 key: test_fscore value: [0.83076923 0.80645161 0.82539683 0.81967213 0.88135593 0.77966102 0.78125 0.86666667 0.85185185 0.84745763] mean value: 0.8290532895006528 key: train_fscore value: [0.93430657 0.94776119 0.9922179 0.98814229 0.96146045 0.99227799 0.92391304 0.92446043 0.96993988 0.93235832] mean value: 0.9566838066212353 key: test_precision value: [0.75 0.75757576 0.76470588 0.78125 0.86666667 0.74193548 0.69444444 0.8125 0.88461538 0.80645161] mean value: 0.7860145232429387 key: train_precision value: [0.87671233 0.90714286 0.98837209 1. 1. 0.98467433 0.86440678 0.85953177 1. 0.87931034] mean value: 0.9360150505499005 key: test_recall value: [0.93103448 0.86206897 0.89655172 0.86206897 0.89655172 0.82142857 0.89285714 0.92857143 0.82142857 0.89285714] mean value: 0.8805418719211823 key: train_recall value: [1. 0.9921875 0.99609375 0.9765625 0.92578125 1. 0.9922179 1. 0.94163424 0.9922179 ] mean value: 0.9816695038910506 key: test_roc_auc value: [0.80480296 0.78817734 0.80541872 0.80603448 0.87684729 0.77278325 0.7567734 0.86083744 0.85899015 0.8429803 ] mean value: 0.8173645320197044 key: train_roc_auc value: [0.92996109 0.94551009 0.9922103 0.98828125 0.96289062 0.9921875 0.91798395 0.91796875 0.97081712 0.92774957] mean value: 0.9545560250486381 key: test_jcc value: [0.71052632 0.67567568 0.7027027 0.69444444 0.78787879 0.63888889 0.64102564 0.76470588 0.74193548 0.73529412] mean value: 0.7093077940276582 key: train_jcc value: [0.87671233 0.90070922 0.98455598 0.9765625 0.92578125 0.98467433 0.85858586 0.85953177 0.94163424 0.87328767] mean value: 0.9182035156322301 MCC on Blind test: 0.12 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03299141 0.03931022 0.04896212 0.03910494 0.03912568 0.0160625 0.01604271 0.01604271 0.04011297 0.03951836] mean value: 0.03272736072540283 key: score_time value: [0.01981163 0.01908088 0.02321577 0.01895833 0.01523948 0.01230335 0.01237154 0.01829481 0.01899481 0.01901555] mean value: 0.017728614807128906 key: test_mcc value: [0.96547546 0.96551724 0.8951918 0.8953202 0.8953202 0.89988258 0.86189955 0.92980296 0.96547546 0.92980296] mean value: 0.9203688385840486 key: train_mcc value: [0.96127828 0.96907736 0.97277661 0.96892956 0.96113155 0.97663743 0.96127477 0.96892768 0.96509421 0.96509421] mean value: 0.9670221668898701 key: test_accuracy value: [0.98245614 0.98245614 0.94736842 0.94736842 0.94736842 0.94736842 0.92982456 0.96491228 0.98245614 0.96491228] mean value: 0.9596491228070175 key: train_accuracy value: [0.98050682 0.98440546 0.98635478 0.98440546 0.98050682 0.98830409 0.98050682 0.98440546 0.98245614 0.98245614] mean value: 0.9834307992202729 key: test_fscore value: [0.98305085 0.98245614 0.94915254 0.94736842 0.94736842 0.94915254 0.93103448 0.96428571 0.98181818 0.96428571] mean value: 0.9599973007807762 key: train_fscore value: [0.98069498 0.98455598 0.98640777 0.98449612 0.98062016 0.98837209 0.98076923 0.98455598 0.98265896 0.98265896] mean value: 0.983579023873464 key: test_precision value: [0.96666667 1. 0.93333333 0.96428571 0.96428571 0.90322581 0.9 0.96428571 1. 0.96428571] mean value: 0.956036866359447 key: train_precision value: [0.96946565 0.97328244 0.98069498 0.97692308 0.97307692 0.98455598 0.96958175 0.97701149 0.97328244 0.97328244] mean value: 0.9751157185652505 key: test_recall value: [1. 0.96551724 0.96551724 0.93103448 0.93103448 1. 0.96428571 0.96428571 0.96428571 0.96428571] mean value: 0.9650246305418719 key: train_recall value: [0.9921875 0.99609375 0.9921875 0.9921875 0.98828125 0.9922179 0.9922179 0.9922179 0.9922179 0.9922179 ] mean value: 0.9922026994163424 key: test_roc_auc value: [0.98214286 0.98275862 0.94704433 0.9476601 0.9476601 0.94827586 0.93041872 0.96490148 0.98214286 0.96490148] mean value: 0.9597906403940888 key: train_roc_auc value: [0.98052955 0.9844282 0.98636612 0.9844206 0.98052195 0.98829645 0.98048395 0.9843902 0.98243707 0.98243707] mean value: 0.9834311162451362 key: test_jcc value: [0.96666667 0.96551724 0.90322581 0.9 0.9 0.90322581 0.87096774 0.93103448 0.96428571 0.93103448] mean value: 0.9235957942687643 key: train_jcc value: [0.96212121 0.96958175 0.97318008 0.96946565 0.96197719 0.97701149 0.96226415 0.96958175 0.96590909 0.96590909] mean value: 0.9677001449029624 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.27509356 0.29359365 0.15107393 0.30144858 0.24914145 0.15170097 0.25802326 0.17746091 0.35042024 0.2620635 ] mean value: 0.2470020055770874 key: score_time value: [0.01926804 0.01318574 0.01766825 0.01989317 0.01275682 0.01305699 0.01279974 0.02457213 0.01348686 0.01300597] mean value: 0.015969371795654295 key: test_mcc value: [0.96547546 0.96551724 0.8951918 0.8953202 0.8953202 0.89988258 0.86189955 0.92980296 0.96547546 0.92980296] mean value: 0.9203688385840486 key: train_mcc value: [0.96127828 0.96907736 0.97277661 0.96892956 0.96113155 0.97663743 0.96127477 0.96892768 0.96509421 0.96509421] mean value: 0.9670221668898701 key: test_accuracy value: [0.98245614 0.98245614 0.94736842 0.94736842 0.94736842 0.94736842 0.92982456 0.96491228 0.98245614 0.96491228] mean value: 0.9596491228070175 key: train_accuracy value: [0.98050682 0.98440546 0.98635478 0.98440546 0.98050682 0.98830409 0.98050682 0.98440546 0.98245614 0.98245614] mean value: 0.9834307992202729 key: test_fscore value: [0.98305085 0.98245614 0.94915254 0.94736842 0.94736842 0.94915254 0.93103448 0.96428571 0.98181818 0.96428571] mean value: 0.9599973007807762 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:128: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:131: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.98069498 0.98455598 0.98640777 0.98449612 0.98062016 0.98837209 0.98076923 0.98455598 0.98265896 0.98265896] mean value: 0.983579023873464 key: test_precision value: [0.96666667 1. 0.93333333 0.96428571 0.96428571 0.90322581 0.9 0.96428571 1. 0.96428571] mean value: 0.956036866359447 key: train_precision value: [0.96946565 0.97328244 0.98069498 0.97692308 0.97307692 0.98455598 0.96958175 0.97701149 0.97328244 0.97328244] mean value: 0.9751157185652505 key: test_recall value: [1. 0.96551724 0.96551724 0.93103448 0.93103448 1. 0.96428571 0.96428571 0.96428571 0.96428571] mean value: 0.9650246305418719 key: train_recall value: [0.9921875 0.99609375 0.9921875 0.9921875 0.98828125 0.9922179 0.9922179 0.9922179 0.9922179 0.9922179 ] mean value: 0.9922026994163424 key: test_roc_auc value: [0.98214286 0.98275862 0.94704433 0.9476601 0.9476601 0.94827586 0.93041872 0.96490148 0.98214286 0.96490148] mean value: 0.9597906403940888 key: train_roc_auc value: [0.98052955 0.9844282 0.98636612 0.9844206 0.98052195 0.98829645 0.98048395 0.9843902 0.98243707 0.98243707] mean value: 0.9834311162451362 key: test_jcc value: [0.96666667 0.96551724 0.90322581 0.9 0.9 0.90322581 0.87096774 0.93103448 0.96428571 0.93103448] mean value: 0.9235957942687643 key: train_jcc value: [0.96212121 0.96958175 0.97318008 0.96946565 0.96197719 0.97701149 0.96226415 0.96958175 0.96590909 0.96590909] mean value: 0.9677001449029624 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02893591 0.03623009 0.03782225 0.03774071 0.0414505 0.03827882 0.0374217 0.03790164 0.03869176 0.03810287] mean value: 0.03725762367248535 key: score_time value: [0.01233792 0.0122304 0.01600552 0.01422644 0.01441526 0.01328301 0.01476669 0.01448369 0.01468158 0.01480675] mean value: 0.01412372589111328 key: test_mcc value: [0.86189955 0.8951918 0.71921182 0.8951918 0.7366424 0.89988258 0.8953202 0.7589669 0.7589669 0.79778885] mean value: 0.8219062797841707 key: train_mcc value: [0.88315143 0.91830593 0.91050744 0.90654547 0.90309643 0.92985363 0.91433828 0.91447603 0.89887645 0.92593212] mean value: 0.9105083215666911 key: test_accuracy value: [0.92982456 0.94736842 0.85964912 0.94736842 0.85964912 0.94736842 0.94736842 0.87719298 0.87719298 0.89473684] mean value: 0.9087719298245613 key: train_accuracy value: [0.94152047 0.95906433 0.95516569 0.95321637 0.95126706 0.96491228 0.95711501 0.95711501 0.94931774 0.96296296] mean value: 0.9551656920077972 key: test_fscore value: [0.92857143 0.94915254 0.86206897 0.94915254 0.84615385 0.94915254 0.94736842 0.88135593 0.88135593 0.9 ] mean value: 0.9094332152820572 key: train_fscore value: [0.94186047 0.95938104 0.95551257 0.95348837 0.95201536 0.96484375 0.95752896 0.95769231 0.95 0.9631068 ] mean value: 0.9555429620654722 key: test_precision value: [0.96296296 0.93333333 0.86206897 0.93333333 0.95652174 0.90322581 0.93103448 0.83870968 0.83870968 0.84375 ] mean value: 0.9003649978326249 key: train_precision value: [0.93461538 0.95019157 0.94636015 0.94615385 0.93584906 0.96862745 0.95019157 0.94676806 0.9391635 0.96124031] mean value: 0.9479160902385434 key: test_recall value: [0.89655172 0.96551724 0.86206897 0.96551724 0.75862069 1. 0.96428571 0.92857143 0.92857143 0.96428571] mean value: 0.9233990147783251 key: train_recall value: [0.94921875 0.96875 0.96484375 0.9609375 0.96875 0.96108949 0.96498054 0.9688716 0.96108949 0.96498054] mean value: 0.9633511673151751 key: test_roc_auc value: [0.93041872 0.94704433 0.85960591 0.94704433 0.8614532 0.94827586 0.9476601 0.87807882 0.87807882 0.89593596] mean value: 0.9093596059113301 key: train_roc_auc value: [0.94153545 0.95908317 0.95518452 0.9532314 0.95130107 0.96491975 0.95709965 0.95709205 0.94929475 0.96295902] mean value: 0.9551700814688716 key: test_jcc value: [0.86666667 0.90322581 0.75757576 0.90322581 0.73333333 0.90322581 0.9 0.78787879 0.78787879 0.81818182] mean value: 0.836119257086999 key: train_jcc value: [0.89010989 0.92193309 0.91481481 0.91111111 0.90842491 0.93207547 0.91851852 0.91881919 0.9047619 0.92883895] mean value: 0.9149407844443863 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.03073525 0.89607024 0.88266683 1.02899694 0.90770841 0.94768596 0.87751889 1.00172234 0.94122171 0.88169622] mean value: 0.939602279663086 key: score_time value: [0.01449108 0.01500177 0.01603222 0.01596642 0.0146718 0.01587105 0.01488566 0.01601124 0.01503849 0.01837087] mean value: 0.01563405990600586 key: test_mcc value: [0.96551724 0.9321832 0.86189955 0.89988258 0.9321832 0.9321832 0.8615634 0.89988258 0.93202124 0.82880708] mean value: 0.9046123268422085 key: train_mcc value: [1. 0.98443556 1. 1. 0.9844054 0.98443509 1. 0.99610895 1. 1. ] mean value: 0.9949384998413872 key: test_accuracy value: [0.98245614 0.96491228 0.92982456 0.94736842 0.96491228 0.96491228 0.92982456 0.94736842 0.96491228 0.9122807 ] mean value: 0.9508771929824561 key: train_accuracy value: [1. 0.99220273 1. 1. 0.99220273 0.99220273 1. 0.99805068 1. 1. ] mean value: 0.9974658869395712 key: test_fscore value: [0.98245614 0.96428571 0.92857143 0.94545455 0.96428571 0.96551724 0.92592593 0.94915254 0.96296296 0.90566038] mean value: 0.9494272592947851 key: train_fscore value: [1. 0.9922179 1. 1. 0.9921875 0.99224806 1. 0.99805068 1. 1. ] mean value: 0.9974704143109397 key: test_precision value: [1. 1. 0.96296296 1. 1. 0.93333333 0.96153846 0.90322581 1. 0.96 ] mean value: 0.972106056428637 key: train_precision value: [1. 0.98837209 1. 1. 0.9921875 0.98841699 1. 1. 1. 1. ] mean value: 0.9968976581440244 key: test_recall value: [0.96551724 0.93103448 0.89655172 0.89655172 0.93103448 1. 0.89285714 1. 0.92857143 0.85714286] mean value: 0.9299261083743843 key: train_recall value: [1. 0.99609375 1. 1. 0.9921875 0.99610895 1. 0.99610895 1. 1. ] mean value: 0.9980499148832684 key: test_roc_auc value: [0.98275862 0.96551724 0.93041872 0.94827586 0.96551724 0.96551724 0.92918719 0.94827586 0.96428571 0.91133005] mean value: 0.9511083743842365 key: train_roc_auc value: [1. 0.9922103 1. 1. 0.9922027 0.9921951 1. 0.99805447 1. 1. ] mean value: 0.9974662572957198 key: test_jcc value: [0.96551724 0.93103448 0.86666667 0.89655172 0.93103448 0.93333333 0.86206897 0.90322581 0.92857143 0.82758621] mean value: 0.9045590338471318 key: train_jcc value: [1. 0.98455598 1. 1. 0.98449612 0.98461538 1. 0.99610895 1. 1. ] mean value: 0.994977644261872 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01544452 0.01079631 0.01084924 0.01048374 0.01036882 0.01137543 0.01046872 0.01074934 0.01117849 0.01048636] mean value: 0.011220097541809082 key: score_time value: [0.01254582 0.00956702 0.00937724 0.00905204 0.00924778 0.00919628 0.00945115 0.00926661 0.00934625 0.00930095] mean value: 0.009635114669799804 key: test_mcc value: [0.47938227 0.73477227 0.65018988 0.35662633 0.43842365 0.76689254 0.54377353 0.553659 0.68850906 0.68472906] mean value: 0.5896957592225097 key: train_mcc value: [0.59993634 0.57835025 0.59632528 0.59569245 0.63963001 0.67451781 0.61484606 0.59641724 0.65352889 0.68074744] mean value: 0.6229991760196517 key: test_accuracy value: [0.73684211 0.85964912 0.8245614 0.66666667 0.71929825 0.87719298 0.77192982 0.77192982 0.84210526 0.84210526] mean value: 0.7912280701754386 key: train_accuracy value: [0.79727096 0.78557505 0.79532164 0.79532164 0.81871345 0.83625731 0.80506823 0.79727096 0.8245614 0.83625731] mean value: 0.8091617933723196 key: test_fscore value: [0.76190476 0.875 0.83333333 0.72463768 0.72413793 0.8852459 0.76363636 0.78688525 0.84745763 0.84210526] mean value: 0.8044344108885885 key: train_fscore value: [0.80952381 0.80072464 0.80804388 0.80733945 0.82551595 0.84269663 0.81684982 0.78947368 0.83455882 0.84837545] mean value: 0.8183102124965754 key: test_precision value: [0.70588235 0.8 0.80645161 0.625 0.72413793 0.81818182 0.77777778 0.72727273 0.80645161 0.82758621] mean value: 0.7618742039910986 key: train_precision value: [0.76206897 0.74662162 0.75945017 0.76124567 0.79422383 0.81227437 0.7716263 0.82278481 0.79094077 0.79124579] mean value: 0.7812482294147253 key: test_recall value: [0.82758621 0.96551724 0.86206897 0.86206897 0.72413793 0.96428571 0.75 0.85714286 0.89285714 0.85714286] mean value: 0.8562807881773399 key: train_recall value: [0.86328125 0.86328125 0.86328125 0.859375 0.859375 0.87548638 0.86770428 0.75875486 0.88326848 0.91439689] mean value: 0.8608204644941634 key: test_roc_auc value: [0.73522167 0.85775862 0.82389163 0.66317734 0.71921182 0.87869458 0.77155172 0.77339901 0.8429803 0.84236453] mean value: 0.7908251231527094 key: train_roc_auc value: [0.79739938 0.78572623 0.79545385 0.79544625 0.81879256 0.83618069 0.80494589 0.79734618 0.82444674 0.83610469] mean value: 0.8091842473249027 key: test_jcc value: [0.61538462 0.77777778 0.71428571 0.56818182 0.56756757 0.79411765 0.61764706 0.64864865 0.73529412 0.72727273] mean value: 0.6766177692648281 key: train_jcc value: [0.68 0.66767372 0.67791411 0.67692308 0.7028754 0.72815534 0.69040248 0.65217391 0.71608833 0.73667712] mean value: 0.6928883476418292 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01124644 0.0112834 0.0114224 0.01132727 0.01072431 0.01181841 0.01175499 0.01123571 0.01174855 0.01170516] mean value: 0.011426663398742676 key: score_time value: [0.00936103 0.00995636 0.00940537 0.00954866 0.00949144 0.00949359 0.00991058 0.00980997 0.00985241 0.00984097] mean value: 0.009667038917541504 key: test_mcc value: [0.65104858 0.65018988 0.57881773 0.54377353 0.51048128 0.58562417 0.62473685 0.53222729 0.72064772 0.54377353] mean value: 0.5941320574687631 key: train_mcc value: [0.66095589 0.64523042 0.66547519 0.6456446 0.66679649 0.62183277 0.66093013 0.66861729 0.64578738 0.66541423] mean value: 0.6546684397270505 key: test_accuracy value: [0.8245614 0.8245614 0.78947368 0.77192982 0.75438596 0.78947368 0.80701754 0.75438596 0.85964912 0.77192982] mean value: 0.7947368421052632 key: train_accuracy value: [0.83040936 0.82261209 0.83235867 0.82261209 0.83235867 0.81091618 0.83040936 0.83430799 0.82261209 0.83235867] mean value: 0.8270955165692008 key: test_fscore value: [0.82142857 0.83333333 0.79310345 0.77966102 0.75 0.8 0.81967213 0.78125 0.85185185 0.76363636] mean value: 0.7993936716622676 key: train_fscore value: [0.83172147 0.82261209 0.83587786 0.82533589 0.83834586 0.81165049 0.83236994 0.83495146 0.82666667 0.8365019 ] mean value: 0.8296033627312248 key: test_precision value: [0.85185185 0.80645161 0.79310345 0.76666667 0.77777778 0.75 0.75757576 0.69444444 0.88461538 0.77777778] mean value: 0.7860264721888749 key: train_precision value: [0.82375479 0.82101167 0.81716418 0.81132075 0.80797101 0.81007752 0.82442748 0.83333333 0.80970149 0.81784387] mean value: 0.817660610307552 key: test_recall value: [0.79310345 0.86206897 0.79310345 0.79310345 0.72413793 0.85714286 0.89285714 0.89285714 0.82142857 0.75 ] mean value: 0.8179802955665024 key: train_recall value: [0.83984375 0.82421875 0.85546875 0.83984375 0.87109375 0.81322957 0.84046693 0.83657588 0.84435798 0.85603113] mean value: 0.8421130228599222 key: test_roc_auc value: [0.82512315 0.82389163 0.78940887 0.77155172 0.75492611 0.79064039 0.80849754 0.7567734 0.85899015 0.77155172] mean value: 0.7951354679802956 key: train_roc_auc value: [0.83042771 0.82261521 0.83240364 0.82264561 0.83243403 0.81091166 0.83038971 0.83430356 0.82256961 0.83231244] mean value: 0.8271013193093385 key: test_jcc value: [0.6969697 0.71428571 0.65714286 0.63888889 0.6 0.66666667 0.69444444 0.64102564 0.74193548 0.61764706] mean value: 0.6669006452118407 key: train_jcc value: [0.71192053 0.6986755 0.71803279 0.70261438 0.72168285 0.68300654 0.71287129 0.71666667 0.70454545 0.71895425] mean value: 0.7088970233011279 MCC on Blind test: 0.47 Accuracy on Blind test: 0.72 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01082969 0.011024 0.01055241 0.0112505 0.01139307 0.01134706 0.01113677 0.01125765 0.01113772 0.01104569] mean value: 0.011097455024719238 key: score_time value: [0.01366639 0.01299024 0.01354313 0.01347542 0.01680923 0.01358867 0.01365185 0.01317978 0.01348186 0.01677513] mean value: 0.014116168022155762 key: test_mcc value: [0.58562417 0.59358067 0.40320623 0.37898808 0.33374384 0.54433498 0.47938227 0.23201635 0.43881637 0.4464279 ] mean value: 0.443612085640124 key: train_mcc value: [0.68941296 0.71041292 0.67593155 0.70540158 0.65887319 0.69002013 0.69956718 0.66878238 0.67386518 0.68754157] mean value: 0.6859808624868353 key: test_accuracy value: [0.78947368 0.78947368 0.70175439 0.68421053 0.66666667 0.77192982 0.73684211 0.61403509 0.71929825 0.71929825] mean value: 0.7192982456140351 key: train_accuracy value: [0.84405458 0.85380117 0.83625731 0.85185185 0.82846004 0.84210526 0.84795322 0.83235867 0.83625731 0.84210526] mean value: 0.8415204678362573 key: test_fscore value: [0.77777778 0.76923077 0.71186441 0.65384615 0.66666667 0.77192982 0.70588235 0.63333333 0.7037037 0.68 ] mean value: 0.7074234988840645 key: train_fscore value: [0.83870968 0.84662577 0.82716049 0.84615385 0.82113821 0.83160083 0.84016393 0.82304527 0.8313253 0.83435583] mean value: 0.8340279158596092 key: test_precision value: [0.84 0.86956522 0.7 0.73913043 0.67857143 0.75862069 0.7826087 0.59375 0.73076923 0.77272727] mean value: 0.7465742969549192 key: train_precision value: [0.86666667 0.88841202 0.87391304 0.87815126 0.8559322 0.89285714 0.88744589 0.87336245 0.85892116 0.87931034] mean value: 0.8754972173577531 key: test_recall value: [0.72413793 0.68965517 0.72413793 0.5862069 0.65517241 0.78571429 0.64285714 0.67857143 0.67857143 0.60714286] mean value: 0.6772167487684729 key: train_recall value: [0.8125 0.80859375 0.78515625 0.81640625 0.7890625 0.77821012 0.79766537 0.77821012 0.80544747 0.79377432] mean value: 0.7965026142996109 key: test_roc_auc value: [0.79064039 0.79125616 0.70135468 0.68596059 0.66687192 0.77216749 0.73522167 0.61514778 0.71859606 0.71736453] mean value: 0.7194581280788177 key: train_roc_auc value: [0.84399319 0.85371322 0.83615789 0.85178289 0.82838339 0.84223006 0.84805143 0.83246443 0.83631749 0.84219966] mean value: 0.8415293652723735 key: test_jcc value: [0.63636364 0.625 0.55263158 0.48571429 0.5 0.62857143 0.54545455 0.46341463 0.54285714 0.51515152] mean value: 0.5495158767206264 key: train_jcc value: [0.72222222 0.73404255 0.70526316 0.73333333 0.69655172 0.71174377 0.72438163 0.6993007 0.71134021 0.71578947] mean value: 0.7153968767633878 MCC on Blind test: 0.12 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02510953 0.02347946 0.02306366 0.02614355 0.02323389 0.02468395 0.02582169 0.02323365 0.02800202 0.02702546] mean value: 0.024979686737060545 key: score_time value: [0.01257253 0.01330853 0.01220036 0.01338387 0.01318002 0.01259565 0.01303172 0.01234198 0.01249027 0.0124352 ] mean value: 0.012754011154174804 key: test_mcc value: [0.75808552 0.93202124 0.65018988 0.75808552 0.50862069 0.82942474 0.83797038 0.56277738 0.7366424 0.72706729] mean value: 0.7300885045397139 key: train_mcc value: [0.80503807 0.82685702 0.82430977 0.82136234 0.82365101 0.79844785 0.80135931 0.83061083 0.7827491 0.83061083] mean value: 0.8144996123456597 key: test_accuracy value: [0.87719298 0.96491228 0.8245614 0.87719298 0.75438596 0.9122807 0.9122807 0.77192982 0.85964912 0.85964912] mean value: 0.8614035087719298 key: train_accuracy value: [0.9005848 0.9122807 0.91033138 0.90838207 0.90838207 0.89668616 0.89863548 0.9122807 0.88888889 0.9122807 ] mean value: 0.9048732943469785 key: test_fscore value: [0.8852459 0.96666667 0.83333333 0.8852459 0.75862069 0.91525424 0.91803279 0.79365079 0.87096774 0.86666667] mean value: 0.8693684719360186 key: train_fscore value: [0.90502793 0.91525424 0.9141791 0.91280148 0.91376147 0.90239411 0.9037037 0.91743119 0.89502762 0.91743119] mean value: 0.9097012046994798 key: test_precision value: [0.84375 0.93548387 0.80645161 0.84375 0.75862069 0.87096774 0.84848485 0.71428571 0.79411765 0.8125 ] mean value: 0.822841212529101 key: train_precision value: [0.86476868 0.88363636 0.875 0.86925795 0.8615917 0.85664336 0.86219081 0.86805556 0.84965035 0.86805556] mean value: 0.8658850323067816 key: test_recall value: [0.93103448 1. 0.86206897 0.93103448 0.75862069 0.96428571 1. 0.89285714 0.96428571 0.92857143] mean value: 0.9232758620689655 key: train_recall value: [0.94921875 0.94921875 0.95703125 0.9609375 0.97265625 0.95330739 0.94941634 0.97276265 0.94552529 0.97276265] mean value: 0.9582836819066147 key: test_roc_auc value: [0.87623153 0.96428571 0.82389163 0.87623153 0.75431034 0.91317734 0.9137931 0.77401478 0.8614532 0.86083744] mean value: 0.8618226600985222 key: train_roc_auc value: [0.90067941 0.91235257 0.91042224 0.90848431 0.90850711 0.89657557 0.8985363 0.91216257 0.88877827 0.91216257] mean value: 0.9048660931420234 key: test_jcc value: [0.79411765 0.93548387 0.71428571 0.79411765 0.61111111 0.84375 0.84848485 0.65789474 0.77142857 0.76470588] mean value: 0.773538002959068 key: train_jcc value: [0.82653061 0.84375 0.8419244 0.83959044 0.84121622 0.82214765 0.82432432 0.84745763 0.81 0.84745763] mean value: 0.8344398900340875 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.04724884 2.13221836 2.09088874 1.98468447 2.13889074 2.03775096 2.05242133 2.10154319 2.35233235 2.07667589] mean value: 2.1014654874801635 key: score_time value: [0.0151341 0.01452017 0.01477504 0.02222133 0.01449585 0.0148468 0.01498914 0.01492929 0.01478672 0.01493883] mean value: 0.015563726425170898 key: test_mcc value: [0.9321832 0.96551724 0.86189955 0.86851042 0.79778885 0.9321832 0.86851042 0.86851042 0.89952865 0.78940887] mean value: 0.8784040805411836 key: train_mcc value: [0.99610889 1. 1. 0.99610889 1. 1. 0.99610895 1. 0.99610895 0.99610895] mean value: 0.9980544629026649 key: test_accuracy value: [0.96491228 0.98245614 0.92982456 0.92982456 0.89473684 0.96491228 0.92982456 0.92982456 0.94736842 0.89473684] mean value: 0.9368421052631579 key: train_accuracy value: [0.99805068 1. 1. 0.99805068 1. 1. 0.99805068 1. 0.99805068 0.99805068] mean value: 0.9990253411306043 key: test_fscore value: [0.96428571 0.98245614 0.92857143 0.92592593 0.88888889 0.96551724 0.93333333 0.93333333 0.94339623 0.89285714] mean value: 0.9358565375341049 key: train_fscore value: [0.99804305 1. 1. 0.99804305 1. 1. 0.99805068 1. 0.99805068 0.99805068] mean value: 0.9990238152458772 key: test_precision value: [1. 1. 0.96296296 1. 0.96 0.93333333 0.875 0.875 1. 0.89285714] mean value: 0.9499153439153439 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.93103448 0.96551724 0.89655172 0.86206897 0.82758621 1. 1. 1. 0.89285714 0.89285714] mean value: 0.9268472906403941 key: train_recall value: [0.99609375 1. 1. 0.99609375 1. 1. 0.99610895 1. 0.99610895 0.99610895] mean value: 0.9980514348249028 key: test_roc_auc value: [0.96551724 0.98275862 0.93041872 0.93103448 0.89593596 0.96551724 0.93103448 0.93103448 0.94642857 0.89470443] mean value: 0.9374384236453202 key: train_roc_auc value: [0.99804688 1. 1. 0.99804688 1. 1. 0.99805447 1. 0.99805447 0.99805447] mean value: 0.9990257174124514 key: test_jcc value: [0.93103448 0.96551724 0.86666667 0.86206897 0.8 0.93333333 0.875 0.875 0.89285714 0.80645161] mean value: 0.8807929445415541 key: train_jcc value: [0.99609375 1. 1. 0.99609375 1. 1. 0.99610895 1. 0.99610895 0.99610895] mean value: 0.9980514348249028 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02580142 0.02164054 0.02364039 0.01933789 0.02146482 0.02186465 0.0194304 0.0223105 0.02421784 0.02074671] mean value: 0.02204551696777344 key: score_time value: [0.01239204 0.00951767 0.0091331 0.00935173 0.00931621 0.00931072 0.00933576 0.00936174 0.00928426 0.00949907] mean value: 0.009650230407714844 key: test_mcc value: [0.96551724 0.96551724 1. 0.9321832 0.96551724 0.82490815 0.96551724 0.82512315 1. 1. ] mean value: 0.9444283466299732 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 1. 0.96491228 0.98245614 0.9122807 0.98245614 0.9122807 1. 1. ] mean value: 0.9719298245614034 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.98245614 1. 0.96428571 0.98245614 0.90909091 0.98245614 0.9122807 1. 1. ] mean value: 0.9715481886534518 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 1. 0.92592593 0.96551724 0.89655172 1. 1. ] mean value: 0.9787994891443167 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.96551724 1. 0.93103448 0.96551724 0.89285714 1. 0.92857143 1. 1. ] mean value: 0.9649014778325123 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.98275862 1. 0.96551724 0.98275862 0.91194581 0.98275862 0.91256158 1. 1. ] mean value: 0.9721059113300493 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.96551724 1. 0.93103448 0.96551724 0.83333333 0.96551724 0.83870968 1. 1. ] mean value: 0.9465146459028551 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12869287 0.13245654 0.13260221 0.13266063 0.13408566 0.12997079 0.12685227 0.12470031 0.12397742 0.1276443 ] mean value: 0.1293642997741699 key: score_time value: [0.01836562 0.02021575 0.01952505 0.01995015 0.01989484 0.01940989 0.01952267 0.01921463 0.01887035 0.01998615] mean value: 0.019495511054992677 key: test_mcc value: [0.9321832 0.96551724 0.82880708 0.93202124 0.7589669 0.89988258 0.86851042 0.7589669 1. 0.82942474] mean value: 0.8774280298980771 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.98245614 0.9122807 0.96491228 0.87719298 0.94736842 0.92982456 0.87719298 1. 0.9122807 ] mean value: 0.9368421052631579 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.98245614 0.91803279 0.96666667 0.87272727 0.94915254 0.93333333 0.88135593 1. 0.91525424] mean value: 0.9383264626113517 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.875 0.93548387 0.92307692 0.90322581 0.875 0.83870968 1. 0.87096774] mean value: 0.9221464019851117 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.93103448 0.96551724 0.96551724 1. 0.82758621 1. 1. 0.92857143 1. 0.96428571] mean value: 0.9582512315270936 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96551724 0.98275862 0.91133005 0.96428571 0.87807882 0.94827586 0.93103448 0.87807882 1. 0.91317734] mean value: 0.9372536945812808 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.96551724 0.84848485 0.93548387 0.77419355 0.90322581 0.875 0.78787879 1. 0.84375 ] mean value: 0.8864568586308019 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.7 Accuracy on Blind test: 0.86 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01136732 0.01122093 0.01079583 0.01113224 0.01157928 0.01075482 0.01083755 0.01173544 0.01077223 0.01102138] mean value: 0.011121702194213868 key: score_time value: [0.00963473 0.00953341 0.00986958 0.00951362 0.00952601 0.00972843 0.00951052 0.00955749 0.00977182 0.00967717] mean value: 0.009632277488708495 key: test_mcc value: [0.64889453 0.92980296 0.93202124 0.65104858 0.50182897 0.75492611 0.76550573 0.68434084 0.96547546 0.72064772] mean value: 0.7554492130393143 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.80701754 0.96491228 0.96491228 0.8245614 0.73684211 0.87719298 0.87719298 0.84210526 0.98245614 0.85964912] mean value: 0.8736842105263157 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7755102 0.96551724 0.96666667 0.82142857 0.69387755 0.87719298 0.8627451 0.83636364 0.98181818 0.85185185] mean value: 0.8632971985105615 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95 0.96551724 0.93548387 0.85185185 0.85 0.86206897 0.95652174 0.85185185 1. 0.88461538] mean value: 0.9107910905313816 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.65517241 0.96551724 1. 0.79310345 0.5862069 0.89285714 0.78571429 0.82142857 0.96428571 0.82142857] mean value: 0.8285714285714285 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.80972906 0.96490148 0.96428571 0.82512315 0.73953202 0.87746305 0.87561576 0.84174877 0.98214286 0.85899015] mean value: 0.8739532019704433 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.63333333 0.93333333 0.93548387 0.6969697 0.53125 0.78125 0.75862069 0.71875 0.96428571 0.74193548] mean value: 0.769521212241596 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.48 Accuracy on Blind test: 0.78 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.81794739 1.80400729 1.78327441 1.78702116 1.78159261 1.77446675 1.77745032 1.78069067 1.81510329 1.79544592] mean value: 1.7916999816894532 key: score_time value: [0.09758425 0.09366488 0.09205317 0.09208465 0.09284616 0.09224868 0.09185553 0.09374571 0.09647107 0.14306402] mean value: 0.09856181144714356 key: test_mcc value: [1. 1. 0.96547546 0.96551724 0.8951918 0.9321832 0.9321832 0.89988258 1. 0.96551724] mean value: 0.9555950718602617 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 1. 0.98245614 0.98245614 0.94736842 0.96491228 0.96491228 0.94736842 1. 0.98245614] mean value: 0.9771929824561403 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 1. 0.98305085 0.98245614 0.94915254 0.96551724 0.96551724 0.94915254 1. 0.98245614] mean value: 0.9777302695663765 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96666667 1. 0.93333333 0.93333333 0.93333333 0.90322581 1. 0.96551724] mean value: 0.963540971449759 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.96551724 0.96551724 1. 1. 1. 1. 1. ] mean value: 0.993103448275862 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 1. 0.98214286 0.98275862 0.94704433 0.96551724 0.96551724 0.94827586 1. 0.98275862] mean value: 0.9774014778325123 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 1. 0.96666667 0.96551724 0.90322581 0.93333333 0.93333333 0.90322581 1. 0.96551724] mean value: 0.957081942899518 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.94383526 0.9610858 1.04445434 0.96728754 1.01617026 0.99173713 0.96424818 1.04056263 0.96362472 0.97514272] mean value: 0.9868148565292358 key: score_time value: [0.2423315 0.27159905 0.20848846 0.27036047 0.26195121 0.25400162 0.24411821 0.23739958 0.24373174 0.18249941] mean value: 0.24164812564849852 key: test_mcc value: [0.96551724 0.96547546 0.93202124 0.96551724 0.85960591 0.89988258 0.96551724 0.86189955 1. 0.9321832 ] mean value: 0.9347619656218213 key: train_mcc value: [0.96907736 0.96907736 0.9652735 0.97289533 0.97307329 0.97672617 0.97289329 0.98057338 0.97289329 0.98057338] mean value: 0.9733056339025454 key: test_accuracy value: [0.98245614 0.98245614 0.96491228 0.98245614 0.92982456 0.94736842 0.98245614 0.92982456 1. 0.96491228] mean value: 0.9666666666666667 key: train_accuracy value: [0.98440546 0.98440546 0.98245614 0.98635478 0.98635478 0.98830409 0.98635478 0.99025341 0.98635478 0.99025341] mean value: 0.9865497076023392 key: test_fscore value: [0.98245614 0.98305085 0.96666667 0.98245614 0.93103448 0.94915254 0.98245614 0.93103448 1. 0.96551724] mean value: 0.9673824684446358 key: train_fscore value: [0.98455598 0.98455598 0.98265896 0.98646035 0.98651252 0.98841699 0.98651252 0.99032882 0.98651252 0.99032882] mean value: 0.9866843477715449 key: test_precision value: [1. 0.96666667 0.93548387 1. 0.93103448 0.90322581 0.96551724 0.9 1. 0.93333333] mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( 0.9535261401557286 key: train_precision value: [0.97328244 0.97328244 0.96958175 0.97701149 0.97338403 0.98084291 0.97709924 0.98461538 0.97709924 0.98461538] mean value: 0.9770814313607344 key: test_recall value: [0.96551724 1. 1. 0.96551724 0.93103448 1. 1. 0.96428571 1. 1. ] mean value: 0.9826354679802956 key: train_recall value: [0.99609375 0.99609375 0.99609375 0.99609375 1. 0.99610895 0.99610895 0.99610895 0.99610895 0.99610895] mean value: 0.9964919747081712 key: test_roc_auc value: [0.98275862 0.98214286 0.96428571 0.98275862 0.92980296 0.94827586 0.98275862 0.93041872 1. 0.96551724] mean value: 0.9668719211822661 key: train_roc_auc value: [0.9844282 0.9844282 0.98248267 0.98637372 0.98638132 0.98828885 0.98633572 0.99024197 0.98633572 0.99024197] mean value: 0.9865538363326849 key: test_jcc value: [0.96551724 0.96666667 0.93548387 0.96551724 0.87096774 0.90322581 0.96551724 0.87096774 1. 0.93333333] mean value: 0.9377196885428254 key: train_jcc value: [0.96958175 0.96958175 0.96590909 0.97328244 0.97338403 0.97709924 0.97338403 0.98084291 0.97338403 0.98084291] mean value: 0.9737292183406805 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02731586 0.01124978 0.01125789 0.011729 0.01140189 0.0112772 0.01180577 0.0113914 0.01137853 0.0109787 ] mean value: 0.012978601455688476 key: score_time value: [0.01053786 0.00988889 0.00988317 0.00965619 0.00942922 0.00986814 0.00963902 0.00995588 0.01001787 0.00911331] mean value: 0.009798955917358399 key: test_mcc value: [0.65104858 0.65018988 0.57881773 0.54377353 0.51048128 0.58562417 0.62473685 0.53222729 0.72064772 0.54377353] mean value: 0.5941320574687631 key: train_mcc value: [0.66095589 0.64523042 0.66547519 0.6456446 0.66679649 0.62183277 0.66093013 0.66861729 0.64578738 0.66541423] mean value: 0.6546684397270505 key: test_accuracy value: [0.8245614 0.8245614 0.78947368 0.77192982 0.75438596 0.78947368 0.80701754 0.75438596 0.85964912 0.77192982] mean value: 0.7947368421052632 key: train_accuracy value: [0.83040936 0.82261209 0.83235867 0.82261209 0.83235867 0.81091618 0.83040936 0.83430799 0.82261209 0.83235867] mean value: 0.8270955165692008 key: test_fscore value: [0.82142857 0.83333333 0.79310345 0.77966102 0.75 0.8 0.81967213 0.78125 0.85185185 0.76363636] mean value: 0.7993936716622676 key: train_fscore value: [0.83172147 0.82261209 0.83587786 0.82533589 0.83834586 0.81165049 0.83236994 0.83495146 0.82666667 0.8365019 ] mean value: 0.8296033627312248 key: test_precision value: [0.85185185 0.80645161 0.79310345 0.76666667 0.77777778 0.75 0.75757576 0.69444444 0.88461538 0.77777778] mean value: 0.7860264721888749 key: train_precision value: [0.82375479 0.82101167 0.81716418 0.81132075 0.80797101 0.81007752 0.82442748 0.83333333 0.80970149 0.81784387] mean value: 0.817660610307552 key: test_recall value: [0.79310345 0.86206897 0.79310345 0.79310345 0.72413793 0.85714286 0.89285714 0.89285714 0.82142857 0.75 ] mean value: 0.8179802955665024 key: train_recall value: [0.83984375 0.82421875 0.85546875 0.83984375 0.87109375 0.81322957 0.84046693 0.83657588 0.84435798 0.85603113] mean value: 0.8421130228599222 key: test_roc_auc value: [0.82512315 0.82389163 0.78940887 0.77155172 0.75492611 0.79064039 0.80849754 0.7567734 0.85899015 0.77155172] mean value: 0.7951354679802956 key: train_roc_auc value: [0.83042771 0.82261521 0.83240364 0.82264561 0.83243403 0.81091166 0.83038971 0.83430356 0.82256961 0.83231244] mean value: 0.8271013193093385 key: test_jcc value: [0.6969697 0.71428571 0.65714286 0.63888889 0.6 0.66666667 0.69444444 0.64102564 0.74193548 0.61764706] mean value: 0.6669006452118407 key: train_jcc value: [0.71192053 0.6986755 0.71803279 0.70261438 0.72168285 0.68300654 0.71287129 0.71666667 0.70454545 0.71895425] mean value: 0.7088970233011279 MCC on Blind test: 0.47 Accuracy on Blind test: 0.72 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.10117292 0.06557846 0.07522464 0.07597613 0.23702955 0.07183504 0.07094789 0.08759308 0.06865072 0.07027149] mean value: 0.09242799282073974 key: score_time value: [0.01112437 0.01097035 0.01136065 0.01136637 0.01188517 0.01121473 0.01161575 0.01136327 0.01081252 0.0108099 ] mean value: 0.011252307891845703 key: test_mcc value: [0.96551724 1. 1. 0.96551724 0.96551724 0.92980296 0.96551724 0.8953202 1. 1. ] mean value: 0.9687192118226601 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 1. 1. 0.98245614 0.98245614 0.96491228 0.98245614 0.94736842 1. 1. ] mean value: 0.9842105263157894 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 1. 1. 0.98245614 0.98245614 0.96428571 0.98245614 0.94736842 1. 1. ] mean value: 0.9841478696741854 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 1. 0.96428571 0.96551724 0.93103448 1. 1. ] mean value: 0.9860837438423645 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 1. 1. 0.96551724 0.96551724 0.96428571 1. 0.96428571 1. 1. ] mean value: 0.982512315270936 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 1. 1. 0.98275862 0.98275862 0.96490148 0.98275862 0.9476601 1. 1. ] mean value: 0.9843596059113301 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 1. 1. 0.96551724 0.96551724 0.93103448 0.96551724 0.9 1. 1. ] mean value: 0.9693103448275863 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04481077 0.05475616 0.04192662 0.06302857 0.05291224 0.07026553 0.07563853 0.05653453 0.07711101 0.05302048] mean value: 0.05900044441223144 key: score_time value: [0.01975703 0.01240492 0.01247978 0.0125854 0.01426888 0.02779078 0.01338577 0.01489663 0.02277255 0.01260567] mean value: 0.0162947416305542 key: test_mcc value: [0.9321832 0.82942474 0.78940887 0.89988258 0.8953202 0.85960591 0.96547546 0.68472906 1. 0.72133224] mean value: 0.857736224960981 key: train_mcc value: [0.96884072 0.97289533 0.96884072 0.96491975 0.96884072 0.9688108 0.9611292 0.95717934 0.96883978 0.97277537] mean value: 0.9673071727588148 key: test_accuracy value: [0.96491228 0.9122807 0.89473684 0.94736842 0.94736842 0.92982456 0.98245614 0.84210526 1. 0.85964912] mean value: 0.9280701754385965 key: train_accuracy value: [0.98440546 0.98635478 0.98440546 0.98245614 0.98440546 0.98440546 0.98050682 0.9785575 0.98440546 0.98635478] mean value: 0.9836257309941521 key: test_fscore value: [0.96428571 0.90909091 0.89655172 0.94545455 0.94736842 0.92857143 0.98181818 0.84210526 1. 0.86206897] mean value: 0.9277315153086478 key: train_fscore value: [0.9844358 0.98646035 0.9844358 0.98245614 0.9844358 0.9844358 0.98069498 0.9787234 0.98449612 0.98646035] mean value: 0.9837034536318615 key: test_precision value: [1. 0.96153846 0.89655172 1. 0.96428571 0.92857143 1. 0.82758621 1. 0.83333333] mean value: 0.9411866868763421 key: train_precision value: [0.98062016 0.97701149 0.98062016 0.98054475 0.98062016 0.9844358 0.97318008 0.97307692 0.98069498 0.98076923] mean value: 0.9791573715285722 key: test_recall value: [0.93103448 0.86206897 0.89655172 0.89655172 0.93103448 0.92857143 0.96428571 0.85714286 1. 0.89285714] mean value: 0.9160098522167488 key: train_recall value: [0.98828125 0.99609375 0.98828125 0.984375 0.98828125 0.9844358 0.98832685 0.9844358 0.98832685 0.9922179 ] mean value: 0.9883055690661479 key: test_roc_auc value: [0.96551724 0.91317734 0.89470443 0.94827586 0.9476601 0.92980296 0.98214286 0.84236453 1. 0.86022167] mean value: 0.9283866995073892 key: train_roc_auc value: [0.984413 0.98637372 0.984413 0.98245987 0.984413 0.9844054 0.98049155 0.97854602 0.9843978 0.98634332] mean value: 0.983625668774319 key: test_jcc value: [0.93103448 0.83333333 0.8125 0.89655172 0.9 0.86666667 0.96428571 0.72727273 1. 0.75757576] mean value: 0.8689220406030751 key: train_jcc value: [0.96934866 0.97328244 0.96934866 0.96551724 0.96934866 0.96934866 0.96212121 0.95833333 0.96946565 0.97328244] mean value: 0.9679396957200327 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01195359 0.01182127 0.01160693 0.01116824 0.01125407 0.01143885 0.01075578 0.01118302 0.0114491 0.01059365] mean value: 0.011322450637817384 key: score_time value: [0.01034856 0.01003671 0.00991583 0.00989461 0.00985289 0.00991845 0.00990295 0.00980353 0.00983691 0.00996399] mean value: 0.009947443008422851 key: test_mcc value: [0.68434084 0.85960591 0.58076493 0.6166424 0.40394089 0.71921182 0.68472906 0.50182897 0.7589669 0.65104858] mean value: 0.6461080302281371 key: train_mcc value: [0.68136015 0.67760831 0.68889059 0.69292741 0.69025624 0.69758394 0.69286683 0.65887319 0.67126183 0.70503989] mean value: 0.6856668379759446 key: test_accuracy value: [0.84210526 0.92982456 0.78947368 0.80701754 0.70175439 0.85964912 0.84210526 0.73684211 0.87719298 0.8245614 ] mean value: 0.8210526315789474 key: train_accuracy value: [0.84015595 0.83820663 0.84405458 0.8460039 0.84405458 0.84795322 0.8460039 0.82846004 0.83430799 0.85185185] mean value: 0.8421052631578947 key: test_fscore value: [0.84745763 0.93103448 0.78571429 0.81967213 0.70175439 0.85714286 0.84210526 0.76923077 0.88135593 0.82758621] mean value: 0.8263053941335466 key: train_fscore value: [0.84410646 0.84250474 0.84732824 0.84952381 0.84962406 0.85338346 0.85009488 0.83520599 0.84171322 0.85660377] mean value: 0.8470088644663055 key: test_precision value: [0.83333333 0.93103448 0.81481481 0.78125 0.71428571 0.85714286 0.82758621 0.67567568 0.83870968 0.8 ] mean value: 0.8073832762326922 key: train_precision value: [0.82222222 0.81918819 0.82835821 0.82899628 0.81884058 0.82545455 0.82962963 0.80505415 0.80714286 0.83150183] mean value: 0.8216388500650803 key: test_recall value: [0.86206897 0.93103448 0.75862069 0.86206897 0.68965517 0.85714286 0.85714286 0.89285714 0.92857143 0.85714286] mean value: 0.8496305418719212 key: train_recall value: [0.8671875 0.8671875 0.8671875 0.87109375 0.8828125 0.88326848 0.87159533 0.86770428 0.87937743 0.88326848] mean value: 0.8740682757782101 key: test_roc_auc value: [0.84174877 0.92980296 0.79002463 0.80603448 0.70197044 0.85960591 0.84236453 0.73953202 0.87807882 0.82512315] mean value: 0.8214285714285714 key: train_roc_auc value: [0.84020854 0.83826301 0.84409959 0.84605271 0.84412999 0.84788424 0.84595392 0.82838339 0.83421997 0.85179049] mean value: 0.8420985834143969 key: test_jcc value: [0.73529412 0.87096774 0.64705882 0.69444444 0.54054054 0.75 0.72727273 0.625 0.78787879 0.70588235] mean value: 0.7084339536189631 key: train_jcc value: [0.73026316 0.72786885 0.73509934 0.7384106 0.73856209 0.7442623 0.73927393 0.7170418 0.7266881 0.74917492] mean value: 0.7346645079135289 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02216458 0.02373385 0.02363467 0.02583838 0.0320487 0.0309329 0.02790904 0.02842832 0.02639103 0.02799988] mean value: 0.026908135414123534 key: score_time value: [0.01075006 0.01167011 0.01230669 0.01222515 0.01232409 0.01231933 0.01234722 0.01239252 0.01221395 0.01230645] mean value: 0.012085556983947754 key: test_mcc value: [0.72242731 0.8953202 0.60819237 0.89988258 0.89988258 0.8953202 0.72242731 0.9321832 0.96547546 0.79778885] mean value: 0.8338900041513282 key: train_mcc value: [0.74613112 0.96509685 0.75908545 0.94169697 0.96127477 0.98057338 0.79621952 0.9844054 0.94645043 0.96147894] mean value: 0.9042412834902326 key: test_accuracy value: [0.84210526 0.94736842 0.77192982 0.94736842 0.94736842 0.94736842 0.84210526 0.96491228 0.98245614 0.89473684] mean value: 0.9087719298245613 key: train_accuracy value: [0.85769981 0.98245614 0.86549708 0.97076023 0.98050682 0.99025341 0.88888889 0.99220273 0.97270955 0.98050682] mean value: 0.9481481481481481 key: test_fscore value: [0.81632653 0.94736842 0.81690141 0.94545455 0.94545455 0.94736842 0.86153846 0.96551724 0.98181818 0.9 ] mean value: 0.9127747756813257 key: train_fscore value: [0.83371298 0.98259188 0.88123924 0.9704142 0.98023715 0.99032882 0.89982425 0.9922179 0.97338403 0.98084291] mean value: 0.9484793372602178 key: test_precision value: [1. 0.96428571 0.69047619 1. 1. 0.93103448 0.75675676 0.93333333 1. 0.84375 ] mean value: 0.9119636477610615 key: train_precision value: [1. 0.97318008 0.78769231 0.98007968 0.992 0.98461538 0.82051282 0.9922179 0.95167286 0.96603774] mean value: 0.9448008767859039 key: test_recall value: [0.68965517 0.93103448 1. 0.89655172 0.89655172 0.96428571 1. 1. 0.96428571 0.96428571] mean value: 0.9306650246305419 key: train_recall value: [0.71484375 0.9921875 1. 0.9609375 0.96875 0.99610895 0.99610895 0.9922179 0.99610895 0.99610895] mean value: 0.9613372446498054 key: test_roc_auc value: [0.84482759 0.9476601 0.76785714 0.94827586 0.94827586 0.9476601 0.84482759 0.96551724 0.98214286 0.89593596] mean value: 0.9092980295566503 key: train_roc_auc value: [0.85742188 0.98247507 0.86575875 0.97074112 0.98048395 0.99024197 0.88867947 0.9922027 0.97266385 0.98047635] mean value: 0.9481145124027237 key: test_jcc value: [0.68965517 0.9 0.69047619 0.89655172 0.89655172 0.9 0.75675676 0.93333333 0.96428571 0.81818182] mean value: 0.8445792433723468 key: train_jcc value: [0.71484375 0.96577947 0.78769231 0.94252874 0.96124031 0.98084291 0.81789137 0.98455598 0.94814815 0.96240602] mean value: 0.9065929004503658 MCC on Blind test: 0.88 Accuracy on Blind test: 0.94 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0179286 0.02019238 0.01859355 0.02175641 0.02475739 0.01901031 0.01926279 0.01868582 0.02003837 0.01949239] mean value: 0.019971799850463868 key: score_time value: [0.01125741 0.01217079 0.01214743 0.01216531 0.01229024 0.01215935 0.01220012 0.01220322 0.01220989 0.01215482] mean value: 0.01209585666656494 key: test_mcc value: [0.64058163 0.75808552 0.7589669 0.8615634 0.9321832 0.8953202 0.79778885 0.82512315 0.92980296 0.85960591] mean value: 0.8259021726142358 key: train_mcc value: [0.66233052 0.9243158 0.88990958 0.89903527 0.9766081 0.94949987 0.95367737 0.93407434 0.95367737 0.95712245] mean value: 0.9100250676726173 key: test_accuracy value: [0.78947368 0.87719298 0.87719298 0.92982456 0.96491228 0.94736842 0.89473684 0.9122807 0.96491228 0.92982456] mean value: 0.9087719298245613 key: train_accuracy value: [0.80506823 0.96101365 0.94346979 0.94736842 0.98830409 0.97465887 0.97660819 0.9668616 0.97660819 0.9785575 ] mean value: 0.9518518518518518 key: test_fscore value: [0.73913043 0.8852459 0.87272727 0.93333333 0.96428571 0.94736842 0.9 0.9122807 0.96428571 0.92857143] mean value: 0.9047228922432434 key: train_fscore value: [0.75728155 0.96226415 0.94093686 0.94972067 0.98828125 0.97445972 0.97701149 0.96646943 0.97701149 0.9785575 ] mean value: 0.9471994134614119 key: test_precision value: [1. 0.84375 0.92307692 0.90322581 1. 0.93103448 0.84375 0.89655172 0.96428571 0.92857143] mean value: 0.9234246079282231 key: train_precision value: [1. 0.93065693 0.98297872 0.90747331 0.98828125 0.98412698 0.96226415 0.98 0.96226415 0.98046875] mean value: 0.9678514253333143 key: test_recall value: [0.5862069 0.93103448 0.82758621 0.96551724 0.93103448 0.96428571 0.96428571 0.92857143 0.96428571 0.92857143] mean value: 0.8991379310344828 key: train_recall value: [0.609375 0.99609375 0.90234375 0.99609375 0.98828125 0.96498054 0.9922179 0.95330739 0.9922179 0.9766537 ] mean value: 0.9371564931906615 key: test_roc_auc value: [0.79310345 0.87623153 0.87807882 0.92918719 0.96551724 0.9476601 0.89593596 0.91256158 0.96490148 0.92980296] mean value: 0.9092980295566503 key: train_roc_auc value: [0.8046875 0.96108189 0.94338977 0.94746322 0.98830405 0.97467777 0.9765777 0.96688807 0.9765777 0.97856122] mean value: 0.951820890077821 key: test_jcc value: [0.5862069 0.79411765 0.77419355 0.875 0.93103448 0.9 0.81818182 0.83870968 0.93103448 0.86666667] mean value: 0.8315145219782726 key: train_jcc value: [0.609375 0.92727273 0.88846154 0.90425532 0.97683398 0.95019157 0.95505618 0.9351145 0.95505618 0.95801527] mean value: 0.9059632263141333 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18768001 0.18640518 0.19032073 0.1912744 0.18816471 0.18460417 0.18546081 0.1917305 0.19130564 0.19142318] mean value: 0.188836932182312 key: score_time value: [0.01549673 0.01700187 0.01743078 0.0170064 0.01694393 0.01691818 0.01700211 0.01698422 0.01711059 0.01660037] mean value: 0.016849517822265625 key: test_mcc value: [0.96551724 0.96551724 0.96547546 0.96551724 0.92980296 0.8953202 0.96551724 0.8953202 1. 0.93202124] mean value: 0.9480009013217541 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 0.98245614 0.98245614 0.96491228 0.94736842 0.98245614 0.94736842 1. 0.96491228] mean value: 0.9736842105263157 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.98245614 0.98305085 0.98245614 0.96551724 0.94736842 0.98245614 0.94736842 1. 0.96296296] mean value: 0.9736092455308673 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96666667 1. 0.96551724 0.93103448 0.96551724 0.93103448 1. 1. ] mean value: 0.9759770114942529 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.96551724 1. 0.96551724 0.96551724 0.96428571 1. 0.96428571 1. 0.92857143] mean value: 0.9719211822660099 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.98275862 0.98214286 0.98275862 0.96490148 0.9476601 0.98275862 0.9476601 1. 0.96428571] mean value: 0.973768472906404 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.96551724 0.96666667 0.96551724 0.93333333 0.9 0.96551724 0.9 1. 0.92857143] mean value: 0.949064039408867 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05879712 0.06589222 0.06732559 0.0758357 0.06974936 0.07213354 0.07219601 0.08236074 0.0727253 0.08393288] mean value: 0.07209484577178955 key: score_time value: [0.02332783 0.02187681 0.02655411 0.03225303 0.02155733 0.02742934 0.03805351 0.03988814 0.02360201 0.04238701] mean value: 0.029692912101745607 key: test_mcc value: [0.96551724 0.9321832 0.92980296 0.96551724 0.9321832 0.8951918 1. 0.9321832 1. 1. ] mean value: 0.9552578839217298 key: train_mcc value: [0.99610889 0.9922027 1. 0.9922027 0.99610889 0.99223298 0.99610895 0.99610895 0.99223298 0.99610895] mean value: 0.9949415988478287 key: test_accuracy value: [0.98245614 0.96491228 0.96491228 0.98245614 0.96491228 0.94736842 1. 0.96491228 1. 1. ] mean value: 0.9771929824561403 key: train_accuracy value: [0.99805068 0.99610136 1. 0.99610136 0.99805068 0.99610136 0.99805068 0.99805068 0.99610136 0.99805068] mean value: 0.9974658869395712 key: test_fscore value: [0.98245614 0.96428571 0.96551724 0.98245614 0.96428571 0.94545455 1. 0.96551724 1. 1. ] mean value: 0.9769972737486349 key: train_fscore value: [0.99804305 0.99609375 1. 0.99609375 0.99804305 0.99609375 0.99805068 0.99805068 0.99609375 0.99805068] mean value: 0.9974613152458772 key: test_precision value: [1. 1. 0.96551724 1. 1. 0.96296296 1. 0.93333333 1. 1. ] mean value: 0.9861813537675607 key: train_precision value: [1. 0.99609375 1. 0.99609375 1. 1. 1. 1. 1. 1. ] mean value: 0.99921875 key: test_recall value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [0.96551724 0.93103448 0.96551724 0.96551724 0.93103448 0.92857143 1. 1. 1. 1. ] mean value: 0.9687192118226601 key: train_recall value: [0.99609375 0.99609375 1. 0.99609375 0.99609375 0.9922179 0.99610895 0.99610895 0.9922179 0.99610895] mean value: 0.9957137645914397 key: test_roc_auc value: [0.98275862 0.96551724 0.96490148 0.98275862 0.96551724 0.94704433 1. 0.96551724 1. 1. ] mean value: 0.9774014778325123 key: train_roc_auc value: [0.99804688 0.99610135 1. 0.99610135 0.99804688 0.99610895 0.99805447 0.99805447 0.99610895 0.99805447] mean value: 0.9974677772373541 key: test_jcc value: [0.96551724 0.93103448 0.93333333 0.96551724 0.93103448 0.89655172 1. 0.93333333 1. 1. ] mean value: 0.9556321839080459 key: train_jcc value: [0.99609375 0.9922179 1. 0.9922179 0.99609375 0.9922179 0.99610895 0.99610895 0.9922179 0.99610895] mean value: 0.9949385943579767 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.15817857 0.20125985 0.22563338 0.20408607 0.2192142 0.20384526 0.19948983 0.22740364 0.22744894 0.22111392] mean value: 0.20876736640930177 key: score_time value: [0.01797915 0.03252816 0.03050971 0.02747774 0.02705717 0.02753019 0.02618265 0.02748179 0.02658629 0.02650428] mean value: 0.02698371410369873 key: test_mcc value: [0.89988258 0.89988258 0.68472906 0.72064772 0.61453202 0.79161589 0.82490815 0.61805122 0.96547546 0.61453202] mean value: 0.7634256700967201 key: train_mcc value: [0.97663743 0.98443509 0.98831147 0.9766081 0.98443509 0.98051435 0.9766081 0.98831165 0.98051435 0.97663814] mean value: 0.9813013765941954 key: test_accuracy value: [0.94736842 0.94736842 0.84210526 0.85964912 0.80701754 0.89473684 0.9122807 0.80701754 0.98245614 0.80701754] mean value: 0.8807017543859649 key: train_accuracy value: [0.98830409 0.99220273 0.99415205 0.98830409 0.99220273 0.99025341 0.98830409 0.99415205 0.99025341 0.98830409] mean value: 0.9906432748538011 key: test_fscore value: [0.94545455 0.94545455 0.84210526 0.86666667 0.80701754 0.89655172 0.90909091 0.81355932 0.98181818 0.80701754] mean value: 0.8814736245533871 key: train_fscore value: [0.98823529 0.99215686 0.99412916 0.98828125 0.99215686 0.99025341 0.98832685 0.99415205 0.99025341 0.98828125] mean value: 0.9906226395765302 key: test_precision value: [1. 1. 0.85714286 0.83870968 0.82142857 0.86666667 0.92592593 0.77419355 1. 0.79310345] mean value: 0.8877170695246335 key: train_precision value: [0.99212598 0.99606299 0.99607843 0.98828125 0.99606299 0.9921875 0.98832685 0.99609375 0.9921875 0.99215686] mean value: 0.9929564110870611 key: test_recall value: [0.89655172 0.89655172 0.82758621 0.89655172 0.79310345 0.92857143 0.89285714 0.85714286 0.96428571 0.82142857] mean value: 0.8774630541871922 key: train_recall value: [0.984375 0.98828125 0.9921875 0.98828125 0.98828125 0.98832685 0.98832685 0.9922179 0.98832685 0.9844358 ] mean value: 0.9883040491245136 key: test_roc_auc value: [0.94827586 0.94827586 0.84236453 0.85899015 0.80726601 0.8953202 0.91194581 0.80788177 0.98214286 0.80726601] mean value: 0.8809729064039409 key: train_roc_auc value: [0.98829645 0.9921951 0.99414822 0.98830405 0.9921951 0.99025717 0.98830405 0.99415582 0.99025717 0.98831165] mean value: 0.9906424793287938 key: test_jcc value: [0.89655172 0.89655172 0.72727273 0.76470588 0.67647059 0.8125 0.83333333 0.68571429 0.96428571 0.67647059] mean value: 0.7933856567705452 key: train_jcc value: [0.97674419 0.9844358 0.98832685 0.97683398 0.9844358 0.98069498 0.97692308 0.98837209 0.98069498 0.97683398] mean value: 0.9814295714630527 MCC on Blind test: 0.45 Accuracy on Blind test: 0.75 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.7267344 0.71088648 0.71327186 0.71637559 0.70127916 0.71965098 0.7172935 0.713516 0.71736693 0.7136898 ] mean value: 0.7150064706802368 key: score_time value: [0.00998569 0.00966191 0.00968909 0.00982118 0.00983381 0.01384377 0.00946999 0.00947881 0.0097661 0.00945377] mean value: 0.010100412368774413 key: test_mcc value: [0.96551724 0.96551724 1. 0.96551724 0.96551724 0.92980296 0.96551724 0.8953202 1. 1. ] mean value: 0.9652709359605911 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 1. 0.98245614 0.98245614 0.96491228 0.98245614 0.94736842 1. 1. ] mean value: 0.9824561403508771 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.98245614 1. 0.98245614 0.98245614 0.96428571 0.98245614 0.94736842 1. 1. ] mean value: 0.9823934837092732 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 1. 0.96428571 0.96551724 0.93103448 1. 1. ] mean value: 0.9860837438423645 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96551724 0.96551724 1. 0.96551724 0.96551724 0.96428571 1. 0.96428571 1. 1. ] mean value: 0.979064039408867 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.98275862 1. 0.98275862 0.98275862 0.96490148 0.98275862 0.9476601 1. 1. ] mean value: 0.9826354679802957 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.96551724 1. 0.96551724 0.96551724 0.93103448 0.96551724 0.9 1. 1. ] mean value: 0.9658620689655173 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03299451 0.03226686 0.03665662 0.03213477 0.0319376 0.03235912 0.03242207 0.03254771 0.04732156 0.06062531] mean value: 0.03712661266326904 key: score_time value: [0.01289058 0.01308775 0.01283455 0.014925 0.01495719 0.01515841 0.01587915 0.01514864 0.01835227 0.01392794] mean value: 0.014716148376464844 key: test_mcc value: [0.76689254 0.65104858 0.6317806 0.61805122 0.6166424 0.61805122 0.79161589 0.54592083 0.69397486 0.54592083] mean value: 0.6479898980537036 key: train_mcc value: [0.93577244 0.87861783 0.93865489 0.91759789 0.86838482 0.91188178 0.90330592 0.91232594 0.87173285 0.92677222] mean value: 0.9065046573444461 key: test_accuracy value: [0.87719298 0.8245614 0.80701754 0.80701754 0.80701754 0.80701754 0.89473684 0.75438596 0.84210526 0.75438596] mean value: 0.8175438596491228 key: train_accuracy value: [0.9668616 0.93567251 0.96881092 0.95711501 0.9337232 0.95516569 0.94931774 0.95516569 0.93177388 0.96296296] mean value: 0.9516569200779726 key: test_fscore value: [0.86792453 0.82142857 0.83076923 0.8 0.81967213 0.81355932 0.89655172 0.78787879 0.82352941 0.78787879] mean value: 0.8249192495341341 key: train_fscore value: [0.96565657 0.93110647 0.96946565 0.95510204 0.932 0.95652174 0.94672131 0.9566855 0.92693111 0.96380952] mean value: 0.9503999907089703 key: test_precision value: [0.95833333 0.85185185 0.75 0.84615385 0.78125 0.77419355 0.86666667 0.68421053 0.91304348 0.68421053] mean value: 0.8109913777285244 key: train_precision value: [1. 1. 0.94776119 1. 0.95491803 0.93014706 1. 0.9270073 1. 0.94402985] mean value: 0.9703863435656607 key: test_recall value: [0.79310345 0.79310345 0.93103448 0.75862069 0.86206897 0.85714286 0.92857143 0.92857143 0.75 0.92857143] mean value: 0.8530788177339902 key: train_recall value: [0.93359375 0.87109375 0.9921875 0.9140625 0.91015625 0.9844358 0.89883268 0.98832685 0.86381323 0.9844358 ] mean value: 0.9340938107976654 key: test_roc_auc value: [0.87869458 0.82512315 0.80480296 0.80788177 0.80603448 0.80788177 0.8953202 0.75738916 0.84051724 0.75738916] mean value: 0.8181034482758621 key: train_roc_auc value: [0.96679688 0.93554688 0.9688564 0.95703125 0.93367735 0.95510852 0.94941634 0.95510092 0.93190661 0.96292102] mean value: 0.9516362171692607 key: test_jcc value: [0.76666667 0.6969697 0.71052632 0.66666667 0.69444444 0.68571429 0.8125 0.65 0.7 0.65 ] mean value: 0.7033488076251234 key: train_jcc value: [0.93359375 0.87109375 0.94074074 0.9140625 0.87265918 0.91666667 0.89883268 0.91696751 0.86381323 0.93014706] mean value: 0.9058577065683058 MCC on Blind test: 0.2 Accuracy on Blind test: 0.61 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02408338 0.05451012 0.04921699 0.03914237 0.03929567 0.03930926 0.03907394 0.03979349 0.03902602 0.03887081] mean value: 0.04023220539093018 key: score_time value: [0.01881361 0.01886582 0.0218544 0.01910973 0.01883149 0.01882291 0.0191009 0.01888132 0.01903081 0.01887798] mean value: 0.01921889781951904 key: test_mcc value: [0.96551724 0.8953202 0.8951918 0.96551724 0.92980296 0.9321832 0.9321832 0.86851042 0.8953202 0.86189955] mean value: 0.9141445996607607 key: train_mcc value: [0.9610433 0.96907736 0.9610433 0.96127828 0.97289533 0.96892768 0.96127477 0.96127477 0.96127477 0.97277537] mean value: 0.9650864958366505 key: test_accuracy value: [0.98245614 0.94736842 0.94736842 0.98245614 0.96491228 0.96491228 0.96491228 0.92982456 0.94736842 0.92982456] mean value: 0.956140350877193 key: train_accuracy value: [0.98050682 0.98440546 0.98050682 0.98050682 0.98635478 0.98440546 0.98050682 0.98050682 0.98050682 0.98635478] mean value: 0.9824561403508771 key: test_fscore value: [0.98245614 0.94736842 0.94915254 0.98245614 0.96551724 0.96551724 0.96551724 0.93333333 0.94736842 0.93103448] mean value: 0.9569721205409785 key: train_fscore /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:148: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:151: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) value: [0.98054475 0.98455598 0.98054475 0.98069498 0.98646035 0.98455598 0.98076923 0.98076923 0.98076923 0.98646035] mean value: 0.9826124832603018 key: test_precision value: [1. 0.96428571 0.93333333 1. 0.96551724 0.93333333 0.93333333 0.875 0.93103448 0.9 ] mean value: 0.9435837438423645 key: train_precision value: [0.97674419 0.97328244 0.97674419 0.96946565 0.97701149 0.97701149 0.96958175 0.96958175 0.96958175 0.98076923] mean value: 0.9739773930119343 key: test_recall value: [0.96551724 0.93103448 0.96551724 0.96551724 0.96551724 1. 1. 1. 0.96428571 0.96428571] mean value: 0.972167487684729 key: train_recall value: [0.984375 0.99609375 0.984375 0.9921875 0.99609375 0.9922179 0.9922179 0.9922179 0.9922179 0.9922179 ] mean value: 0.9914214494163425 key: test_roc_auc value: [0.98275862 0.9476601 0.94704433 0.98275862 0.96490148 0.96551724 0.96551724 0.93103448 0.9476601 0.93041872] mean value: 0.9565270935960591 key: train_roc_auc value: [0.98051435 0.9844282 0.98051435 0.98052955 0.98637372 0.9843902 0.98048395 0.98048395 0.98048395 0.98634332] mean value: 0.9824545537451362 key: test_jcc value: [0.96551724 0.9 0.90322581 0.96551724 0.93333333 0.93333333 0.93333333 0.875 0.9 0.87096774] mean value: 0.9180228031145717 key: train_jcc value: [0.96183206 0.96958175 0.96183206 0.96212121 0.97328244 0.96958175 0.96226415 0.96226415 0.96226415 0.97328244] mean value: 0.9658306170683848 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.26162863 0.29615593 0.30786967 0.42793369 0.33886909 0.30964303 0.3056078 0.28437376 0.28493261 0.29579973] mean value: 0.3112813949584961 key: score_time value: [0.01991153 0.02569556 0.01902437 0.01899886 0.01900196 0.01907182 0.01891279 0.01901984 0.01902318 0.01902699] mean value: 0.019768691062927245 key: test_mcc value: [0.96551724 0.8953202 0.8951918 0.89988258 0.92980296 0.9321832 0.9321832 0.79778885 0.9321832 0.86189955] mean value: 0.9041952772966289 key: train_mcc value: [0.9610433 0.96907736 0.9610433 0.96884072 0.97289533 0.9688108 0.96127477 0.9611292 0.9611292 0.97277537] mean value: 0.9658019355184825 key: test_accuracy value: [0.98245614 0.94736842 0.94736842 0.94736842 0.96491228 0.96491228 0.96491228 0.89473684 0.96491228 0.92982456] mean value: 0.9508771929824561 key: train_accuracy value: [0.98050682 0.98440546 0.98050682 0.98440546 0.98635478 0.98440546 0.98050682 0.98050682 0.98050682 0.98635478] mean value: 0.9828460038986355 key: test_fscore value: [0.98245614 0.94736842 0.94915254 0.94545455 0.96551724 0.96551724 0.96551724 0.9 0.96551724 0.93103448] mean value: 0.9517535097506797 key: train_fscore value: [0.98054475 0.98455598 0.98054475 0.9844358 0.98646035 0.9844358 0.98076923 0.98069498 0.98069498 0.98646035] mean value: 0.9829596962534292 key: test_precision value: [1. 0.96428571 0.93333333 1. 0.96551724 0.93333333 0.93333333 0.84375 0.93333333 0.9 ] mean value: 0.9406886288998358 key: train_precision value: [0.97674419 0.97328244 0.97674419 0.98062016 0.97701149 0.9844358 0.96958175 0.97318008 0.97318008 0.98076923] mean value: 0.9765549394873483 key: test_recall value: [0.96551724 0.93103448 0.96551724 0.89655172 0.96551724 1. 1. 0.96428571 1. 0.96428571] mean value: 0.9652709359605911 key: train_recall value: [0.984375 0.99609375 0.984375 0.98828125 0.99609375 0.9844358 0.9922179 0.98832685 0.98832685 0.9922179 ] mean value: 0.9894744041828794 key: test_roc_auc value: [0.98275862 0.9476601 0.94704433 0.94827586 0.96490148 0.96551724 0.96551724 0.89593596 0.96551724 0.93041872] mean value: 0.9513546798029557 key: train_roc_auc value: [0.98051435 0.9844282 0.98051435 0.984413 0.98637372 0.9844054 0.98048395 0.98049155 0.98049155 0.98634332] mean value: 0.9828459387159534 key: test_jcc value: [0.96551724 0.9 0.90322581 0.89655172 0.93333333 0.93333333 0.93333333 0.81818182 0.93333333 0.87096774] mean value: 0.908777766541949 key: train_jcc value: [0.96183206 0.96958175 0.96183206 0.96934866 0.97328244 0.96934866 0.96226415 0.96212121 0.96212121 0.97328244] mean value: 0.9665014649876501 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0320878 0.0302875 0.03224111 0.03226137 0.03130984 0.02569366 0.0313189 0.03118372 0.0292275 0.03123951] mean value: 0.03068509101867676 key: score_time value: [0.01438856 0.01213551 0.01232028 0.01176858 0.0118649 0.01178813 0.01177907 0.01186204 0.01174831 0.0119381 ] mean value: 0.012159347534179688 key: test_mcc value: [0.87447463 0.6681531 0.53145678 0.87082337 0.65714286 0.65714286 0.7320658 0.37799476 0.79426746 0.79426746] mean value: 0.695778908240471 key: train_mcc value: [0.87229005 0.89479986 0.87133216 0.91763327 0.85847606 0.86558426 0.87926178 0.86403922 0.87198258 0.87926178] mean value: 0.8774661011532328 key: test_accuracy value: [0.93333333 0.83333333 0.75862069 0.93103448 0.82758621 0.82758621 0.86206897 0.68965517 0.89655172 0.89655172] mean value: 0.845632183908046 key: train_accuracy value: [0.9351145 0.94656489 0.93536122 0.9581749 0.92775665 0.93155894 0.9391635 0.93155894 0.93536122 0.9391635 ] mean value: 0.9379778248628566 key: test_fscore value: [0.9375 0.83870968 0.77419355 0.93333333 0.82758621 0.82758621 0.85714286 0.70967742 0.90322581 0.90322581] mean value: 0.851218086233381 key: train_fscore value: [0.93726937 0.94814815 0.93680297 0.95940959 0.93090909 0.93430657 0.94029851 0.93283582 0.93680297 0.94029851] mean value: 0.9397081558966259 key: test_precision value: [0.88235294 0.8125 0.70588235 0.875 0.8 0.8 0.92307692 0.6875 0.875 0.875 ] mean value: 0.823631221719457 key: train_precision value: [0.90714286 0.92086331 0.91970803 0.9352518 0.8951049 0.90140845 0.91970803 0.91240876 0.91304348 0.91970803] mean value: 0.9144347635841845 key: test_recall value: [1. 0.86666667 0.85714286 1. 0.85714286 0.85714286 0.8 0.73333333 0.93333333 0.93333333] mean value: 0.8838095238095238 key: train_recall value: [0.96946565 0.97709924 0.95454545 0.98484848 0.96969697 0.96969697 0.96183206 0.95419847 0.96183206 0.96183206] mean value: 0.9665047420772612 key: test_roc_auc value: [0.93333333 0.83333333 0.76190476 0.93333333 0.82857143 0.82857143 0.86428571 0.68809524 0.8952381 0.8952381 ] mean value: 0.8461904761904763 key: train_roc_auc value: [0.9351145 0.94656489 0.93528799 0.9580731 0.92759658 0.93141337 0.93924936 0.93164469 0.93546149 0.93924936] mean value: 0.9379655331945409 key: test_jcc value: [0.88235294 0.72222222 0.63157895 0.875 0.70588235 0.70588235 0.75 0.55 0.82352941 0.82352941] mean value: 0.7469977640178879 key: train_jcc value: [0.88194444 0.90140845 0.88111888 0.92198582 0.8707483 0.87671233 0.88732394 0.87412587 0.88111888 0.88732394] mean value: 0.8863810862525938 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.72078919 0.87093186 0.76989698 0.7747066 0.90087986 0.77547503 0.76327038 0.875139 0.7889235 0.83031797] mean value: 0.8070330381393432 key: score_time value: [0.01421833 0.01484942 0.01467299 0.01446486 0.01456046 0.01565599 0.01448941 0.01475024 0.01570296 0.01492667] mean value: 0.014829134941101075 key: test_mcc value: [0.93541435 0.80178373 0.72380952 0.86965655 0.93333333 0.72954522 0.7320658 0.58571429 0.79426746 0.93302503] mean value: 0.8038615283681674 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99242424 1. ] mean value: 0.9992424242424243 key: test_accuracy value: [0.96666667 0.9 0.86206897 0.93103448 0.96551724 0.86206897 0.86206897 0.79310345 0.89655172 0.96551724] mean value: 0.9004597701149425 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99619772 1. ] mean value: 0.9996197718631179 key: test_fscore value: [0.96774194 0.90322581 0.85714286 0.92307692 0.96551724 0.84615385 0.85714286 0.8 0.90322581 0.96774194] mean value: 0.8990969208766761 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99619772 1. ] mean value: 0.9996197718631179 key: test_precision value: [0.9375 0.875 0.85714286 1. 0.93333333 0.91666667 0.92307692 0.8 0.875 0.9375 ] mean value: 0.905521978021978 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99242424 1. ] mean value: 0.9992424242424243 key: test_recall value: [1. 0.93333333 0.85714286 0.85714286 1. 0.78571429 0.8 0.8 0.93333333 1. ] mean value: 0.8966666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.9 0.86190476 0.92857143 0.96666667 0.85952381 0.86428571 0.79285714 0.8952381 0.96428571] mean value: 0.9 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99621212 1. ] mean value: 0.9996212121212121 key: test_jcc value: [0.9375 0.82352941 0.75 0.85714286 0.93333333 0.73333333 0.75 0.66666667 0.82352941 0.9375 ] mean value: 0.8212535014005602 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99242424 1. ] mean value: 0.9992424242424243 MCC on Blind test: 0.67 Accuracy on Blind test: 0.83 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01313186 0.00966859 0.00940609 0.00909376 0.00917745 0.00930548 0.00901055 0.00931048 0.00922346 0.00923944] mean value: 0.009656715393066406 key: score_time value: [0.01565719 0.00918484 0.00901389 0.0087471 0.00874877 0.00872231 0.00879788 0.00867724 0.00927162 0.00879645] mean value: 0.009561729431152344 key: test_mcc value: [0.47087096 0.13608276 0.59628479 0.67156812 0.7952381 0.51675233 0.7320658 0.44761905 0.75093926 0.60575767] mean value: 0.5723178827436838 key: train_mcc value: [0.64962264 0.52565029 0.72322987 0.68527926 0.72970342 0.62009723 0.70257774 0.72484208 0.687232 0.64958558] mean value: 0.6697820100068201 key: test_accuracy value: [0.73333333 0.56666667 0.75862069 0.82758621 0.89655172 0.75862069 0.86206897 0.72413793 0.86206897 0.79310345] mean value: 0.7782758620689655 key: train_accuracy value: [0.82061069 0.73664122 0.85931559 0.8365019 0.85931559 0.80988593 0.84790875 0.85931559 0.84030418 0.82129278] mean value: 0.8291092212579456 key: test_fscore value: [0.75 0.60606061 0.8 0.83870968 0.89655172 0.74074074 0.85714286 0.73333333 0.88235294 0.82352941] mean value: 0.7928421291776 key: train_fscore value: [0.83392226 0.78369906 0.86738351 0.85121107 0.87108014 0.80769231 0.85714286 0.86738351 0.85 0.83274021] mean value: 0.8422254936530311 key: test_precision value: [0.70588235 0.55555556 0.66666667 0.76470588 0.86666667 0.76923077 0.92307692 0.73333333 0.78947368 0.73684211] mean value: 0.7511433939297716 key: train_precision value: [0.77631579 0.66489362 0.82312925 0.78343949 0.80645161 0.8203125 0.80536913 0.81756757 0.79865772 0.78 ] mean value: 0.7876136674749878 key: test_recall value: [0.8 0.66666667 1. 0.92857143 0.92857143 0.71428571 0.8 0.73333333 1. 0.93333333] mean value: 0.8504761904761905 key: train_recall value: [0.90076336 0.95419847 0.91666667 0.93181818 0.9469697 0.79545455 0.91603053 0.92366412 0.90839695 0.89312977] mean value: 0.9087092297015961 key: test_roc_auc value: [0.73333333 0.56666667 0.76666667 0.83095238 0.89761905 0.75714286 0.86428571 0.72380952 0.85714286 0.78809524] mean value: 0.7785714285714286 key: train_roc_auc value: [0.82061069 0.73664122 0.85909669 0.8361381 0.85898103 0.80994101 0.84816678 0.85955933 0.84056211 0.82156489] mean value: 0.8291261855193153 key: test_jcc value: [0.6 0.43478261 0.66666667 0.72222222 0.8125 0.58823529 0.75 0.57894737 0.78947368 0.7 ] mean value: 0.6642827844333767 key: train_jcc value: [0.71515152 0.6443299 0.76582278 0.74096386 0.77160494 0.67741935 0.75 0.76582278 0.73913043 0.71341463] mean value: 0.7283660199139936 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00948977 0.00949192 0.0095396 0.00949955 0.00931692 0.00925446 0.00933433 0.0091722 0.00922704 0.00945187] mean value: 0.009377765655517577 key: score_time value: [0.00895619 0.00884056 0.00899911 0.00888467 0.00869846 0.00875378 0.00876188 0.00872612 0.00863218 0.00872183] mean value: 0.008797478675842286 key: test_mcc value: [0.6 0.47087096 0.6555099 0.51675233 0.51904762 0.51904762 0.47079191 0.37799476 0.58943389 0.59330823] mean value: 0.531275719579118 key: train_mcc value: [0.6261184 0.69498055 0.62787882 0.68095573 0.65781864 0.65781864 0.59892463 0.68823734 0.63502806 0.65024005] mean value: 0.6518000864251088 key: test_accuracy value: [0.8 0.73333333 0.82758621 0.75862069 0.75862069 0.75862069 0.72413793 0.68965517 0.79310345 0.79310345] mean value: 0.7636781609195402 key: train_accuracy value: [0.8129771 0.84732824 0.81368821 0.84030418 0.82889734 0.82889734 0.79847909 0.84410646 0.81749049 0.82509506] mean value: 0.8257263518416393 key: test_fscore value: [0.8 0.71428571 0.81481481 0.74074074 0.75862069 0.75862069 0.69230769 0.70967742 0.8125 0.78571429] mean value: 0.7587282046528431 key: train_fscore value: [0.81081081 0.84962406 0.81081081 0.83846154 0.82889734 0.82889734 0.78884462 0.84410646 0.81538462 0.82307692] mean value: 0.823891452089343 key: test_precision value: [0.8 0.76923077 0.84615385 0.76923077 0.73333333 0.73333333 0.81818182 0.6875 0.76470588 0.84615385] mean value: 0.7767823597970657 key: train_precision value: [0.8203125 0.83703704 0.82677165 0.8515625 0.83206107 0.83206107 0.825 0.84090909 0.82170543 0.82945736] mean value: 0.831687770959169 key: test_recall value: [0.8 0.66666667 0.78571429 0.71428571 0.78571429 0.78571429 0.6 0.73333333 0.86666667 0.73333333] mean value: 0.7471428571428571 key: train_recall value: [0.80152672 0.86259542 0.79545455 0.82575758 0.82575758 0.82575758 0.75572519 0.84732824 0.80916031 0.81679389] mean value: 0.816585704371964 key: test_roc_auc value: [0.8 0.73333333 0.82619048 0.75714286 0.75952381 0.75952381 0.72857143 0.68809524 0.79047619 0.7952381 ] mean value: 0.7638095238095238 key: train_roc_auc value: [0.8129771 0.84732824 0.81375781 0.8403597 0.82890932 0.82890932 0.79831714 0.84411867 0.81745894 0.82506361] mean value: 0.8257199861207495 key: test_jcc value: [0.66666667 0.55555556 0.6875 0.58823529 0.61111111 0.61111111 0.52941176 0.55 0.68421053 0.64705882] mean value: 0.6130860853113176 key: train_jcc value: [0.68181818 0.73856209 0.68181818 0.7218543 0.70779221 0.70779221 0.65131579 0.73026316 0.68831169 0.69934641] mean value: 0.7008874216268676 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00884247 0.01010442 0.01012564 0.00979137 0.0101862 0.01019549 0.01009035 0.01044989 0.01014018 0.01014948] mean value: 0.010007548332214355 key: score_time value: [0.0144124 0.01187205 0.01217246 0.01165318 0.01194263 0.01190805 0.01191378 0.01192284 0.01200461 0.01685095] mean value: 0.012665295600891113 key: test_mcc value: [0.40824829 0.06726728 0.51904762 0.37799476 0.31579309 0.44761905 0.41051346 0.44932255 0.44932255 0.45455066] mean value: 0.38996793082617276 key: train_mcc value: [0.6261184 0.67938931 0.65024005 0.65831512 0.61981608 0.62737841 0.65779886 0.71102244 0.62739995 0.62737841] mean value: 0.6484857019359617 key: test_accuracy value: [0.7 0.53333333 0.75862069 0.68965517 0.65517241 0.72413793 0.68965517 0.72413793 0.72413793 0.72413793] mean value: 0.6922988505747126 key: train_accuracy value: [0.8129771 0.83969466 0.82509506 0.82889734 0.80988593 0.81368821 0.82889734 0.85551331 0.81368821 0.81368821] mean value: 0.8242025367892492 key: test_fscore value: [0.72727273 0.5625 0.75862069 0.66666667 0.66666667 0.71428571 0.64 0.75 0.75 0.71428571] mean value: 0.6950298178832661 key: train_fscore value: [0.81509434 0.83969466 0.82706767 0.82625483 0.81203008 0.81509434 0.82758621 0.85496183 0.81368821 0.81226054] mean value: 0.8243732694633406 key: test_precision value: [0.66666667 0.52941176 0.73333333 0.69230769 0.625 0.71428571 0.8 0.70588235 0.70588235 0.76923077] mean value: 0.6942000646412412 key: train_precision value: [0.80597015 0.83969466 0.82089552 0.84251969 0.80597015 0.81203008 0.83076923 0.85496183 0.81060606 0.81538462] mean value: 0.8238801976432387 key: test_recall value: [0.8 0.6 0.78571429 0.64285714 0.71428571 0.71428571 0.53333333 0.8 0.8 0.66666667] mean value: 0.7057142857142857 key: train_recall value: [0.82442748 0.83969466 0.83333333 0.81060606 0.81818182 0.81818182 0.82442748 0.85496183 0.81679389 0.80916031] mean value: 0.8249768679157993 key: test_roc_auc value: [0.7 0.53333333 0.75952381 0.68809524 0.65714286 0.72380952 0.6952381 0.72142857 0.72142857 0.72619048] mean value: 0.6926190476190477 key: train_roc_auc value: [0.8129771 0.83969466 0.82506361 0.82896715 0.80985427 0.81367106 0.82888041 0.85551122 0.81369998 0.81367106] mean value: 0.8241990515845478 key: test_jcc value: [0.57142857 0.39130435 0.61111111 0.5 0.5 0.55555556 0.47058824 0.6 0.6 0.55555556] mean value: 0.5355543376770998 key: train_jcc value: [0.68789809 0.72368421 0.70512821 0.70394737 0.6835443 0.68789809 0.70588235 0.74666667 0.68589744 0.68387097] mean value: 0.7014417689464205 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01403451 0.01422858 0.01349854 0.01356149 0.01517701 0.01492858 0.01510477 0.0139873 0.01390624 0.01416183] mean value: 0.014258885383605957 key: score_time value: [0.0101316 0.01045656 0.01018333 0.01049423 0.01039624 0.01010704 0.00998545 0.01039577 0.01092601 0.0106802 ] mean value: 0.010375642776489257 key: test_mcc value: [0.76088591 0.6 0.59330823 0.75522869 0.7320658 0.49891416 0.6130103 0.52473682 0.6669552 0.72954522] mean value: 0.6474650323376637 key: train_mcc value: [0.78526617 0.82149863 0.77061242 0.77908214 0.78449388 0.77753667 0.80694112 0.77773853 0.77636354 0.79584333] mean value: 0.7875376442859248 key: test_accuracy value: [0.86666667 0.8 0.79310345 0.86206897 0.86206897 0.72413793 0.79310345 0.75862069 0.82758621 0.86206897] mean value: 0.8149425287356322 key: train_accuracy value: [0.88931298 0.90839695 0.88212928 0.88593156 0.88973384 0.88593156 0.90114068 0.88593156 0.88212928 0.8973384 ] mean value: 0.890797608335994 key: test_fscore value: [0.88235294 0.8 0.8 0.875 0.86666667 0.76470588 0.76923077 0.78787879 0.84848485 0.875 ] mean value: 0.8269319895790483 key: train_fscore value: [0.89605735 0.91304348 0.88967972 0.89361702 0.89605735 0.89285714 0.9057971 0.89208633 0.89122807 0.89962825] mean value: 0.8970051808385671 key: test_precision value: [0.78947368 0.8 0.75 0.77777778 0.8125 0.65 0.90909091 0.72222222 0.77777778 0.82352941] mean value: 0.7812371782843919 key: train_precision value: [0.84459459 0.86896552 0.83892617 0.84 0.85034014 0.84459459 0.86206897 0.84353741 0.82467532 0.87681159] mean value: 0.8494514316343086 key: test_recall value: [1. 0.8 0.85714286 1. 0.92857143 0.92857143 0.66666667 0.86666667 0.93333333 0.93333333] mean value: 0.8914285714285715 key: train_recall value: [0.95419847 0.96183206 0.9469697 0.95454545 0.9469697 0.9469697 0.95419847 0.94656489 0.96946565 0.92366412] mean value: 0.9505378209576683 key: test_roc_auc value: [0.86666667 0.8 0.7952381 0.86666667 0.86428571 0.73095238 0.79761905 0.7547619 0.82380952 0.85952381] mean value: 0.815952380952381 key: train_roc_auc value: [0.88931298 0.90839695 0.8818818 0.88566967 0.88951538 0.88569859 0.90134166 0.88616123 0.8824601 0.89743812] mean value: 0.8907876474670368 key: test_jcc value: [0.78947368 0.66666667 0.66666667 0.77777778 0.76470588 0.61904762 0.625 0.65 0.73684211 0.77777778] mean value: 0.7073958179763133 key: train_jcc value: [0.81168831 0.84 0.80128205 0.80769231 0.81168831 0.80645161 0.82781457 0.80519481 0.80379747 0.81756757] mean value: 0.8133177005907435 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.13563442 1.24090719 1.12091327 1.32892895 1.12880158 1.26187825 1.13260579 1.25287151 1.14153361 1.26136589] mean value: 1.200544047355652 key: score_time value: [0.01544094 0.01418996 0.01236367 0.01428485 0.0153091 0.01543164 0.01432538 0.01541853 0.01238394 0.01642776] mean value: 0.014557576179504395 key: test_mcc value: [0.93541435 0.6 0.65714286 0.65714286 0.7320658 0.58571429 0.67156812 0.51675233 0.6669552 0.86965655] mean value: 0.6892412343569269 key: train_mcc value: [0.99239533 1. 1. 1. 1. 1. 1. 0.99242424 1. 0.99242424] mean value: 0.9977243811746231 key: test_accuracy value: [0.96666667 0.8 0.82758621 0.82758621 0.86206897 0.79310345 0.82758621 0.75862069 0.82758621 0.93103448] mean value: 0.842183908045977 key: train_accuracy value: [0.99618321 1. 1. 1. 1. 1. 1. 0.99619772 1. 0.99619772] mean value: 0.9988578643369228 key: test_fscore value: [0.96774194 0.8 0.82758621 0.82758621 0.86666667 0.78571429 0.81481481 0.77419355 0.84848485 0.9375 ] mean value: 0.8450288513344687 key: train_fscore value: [0.99619772 1. 1. 1. 1. 1. 1. 0.99619772 1. 0.99619772] mean value: 0.9988593155893536 key: test_precision value: [0.9375 0.8 0.8 0.8 0.8125 0.78571429 0.91666667 0.75 0.77777778 0.88235294] mean value: 0.8262511671335201 key: train_precision value: [0.99242424 1. 1. 1. 1. 1. 1. 0.99242424 1. 0.99242424] mean value: 0.9977272727272727 key: test_recall value: [1. 0.8 0.85714286 0.85714286 0.92857143 0.78571429 0.73333333 0.8 0.93333333 1. ] mean value: 0.8695238095238095 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.8 0.82857143 0.82857143 0.86428571 0.79285714 0.83095238 0.75714286 0.82380952 0.92857143] mean value: 0.8421428571428572 key: train_roc_auc value: [0.99618321 1. 1. 1. 1. 1. 1. 0.99621212 1. 0.99621212] mean value: 0.9988607448531113 key: test_jcc value: [0.9375 0.66666667 0.70588235 0.70588235 0.76470588 0.64705882 0.6875 0.63157895 0.73684211 0.88235294] mean value: 0.7365970072239422 key: train_jcc value: [0.99242424 1. 1. 1. 1. 1. 1. 0.99242424 1. 0.99242424] mean value: 0.9977272727272727 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01959491 0.01731563 0.01389885 0.01488256 0.01310444 0.01357579 0.01422048 0.01348567 0.01311564 0.01370406] mean value: 0.014689803123474121 key: score_time value: [0.01209402 0.00925374 0.00893354 0.00895524 0.00898933 0.00864458 0.00868011 0.00873613 0.00873876 0.00870037] mean value: 0.009172582626342773 key: test_mcc value: [0.93541435 0.76088591 0.7952381 1. 0.93333333 0.93302503 0.93302503 0.86190476 0.86190476 0.93333333] mean value: 0.89480646108909 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96666667 0.86666667 0.89655172 1. 0.96551724 0.96551724 0.96551724 0.93103448 0.93103448 0.96551724] mean value: 0.9454022988505747 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96551724 0.88235294 0.89655172 1. 0.96551724 0.96296296 0.96774194 0.93333333 0.93333333 0.96551724] mean value: 0.9472827954565833 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.78947368 0.86666667 1. 0.93333333 1. 0.9375 0.93333333 0.93333333 1. ] mean value: 0.9393640350877193 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.93333333 1. 0.92857143 1. 1. 0.92857143 1. 0.93333333 0.93333333 0.93333333] mean value: 0.959047619047619 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.86666667 0.89761905 1. 0.96666667 0.96428571 0.96428571 0.93095238 0.93095238 0.96666667] mean value: 0.9454761904761905 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93333333 0.78947368 0.8125 1. 0.93333333 0.92857143 0.9375 0.875 0.875 0.93333333] mean value: 0.9018045112781955 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10119128 0.10048294 0.09955859 0.09904623 0.0998683 0.10016418 0.09969735 0.09930968 0.09993553 0.10012221] mean value: 0.09993762969970703 key: score_time value: [0.01779675 0.01739693 0.01740122 0.01741219 0.01740956 0.01755548 0.01746917 0.01769543 0.01732755 0.01740599] mean value: 0.01748702526092529 key: test_mcc value: [0.87447463 0.6 0.7952381 0.87082337 0.72380952 0.58571429 0.67156812 0.6555099 0.72954522 0.79426746] mean value: 0.730095059899352 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.8 0.89655172 0.93103448 0.86206897 0.79310345 0.82758621 0.82758621 0.86206897 0.89655172] mean value: 0.8629885057471265 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9375 0.8 0.89655172 0.93333333 0.85714286 0.78571429 0.81481481 0.83870968 0.875 0.90322581] mean value: 0.8641992499014189 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88235294 0.8 0.86666667 0.875 0.85714286 0.78571429 0.91666667 0.8125 0.82352941 0.875 ] mean value: 0.8494572829131652 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.8 0.92857143 1. 0.85714286 0.78571429 0.73333333 0.86666667 0.93333333 0.93333333] mean value: 0.8838095238095238 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93333333 0.8 0.89761905 0.93333333 0.86190476 0.79285714 0.83095238 0.82619048 0.85952381 0.8952381 ] mean value: 0.8630952380952381 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88235294 0.66666667 0.8125 0.875 0.75 0.64705882 0.6875 0.72222222 0.77777778 0.82352941] mean value: 0.7644607843137254 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00918102 0.00931501 0.00934839 0.00934172 0.00923967 0.00923657 0.00935435 0.00937009 0.00967383 0.00934172] mean value: 0.009340238571166993 key: score_time value: [0.00858641 0.008672 0.00861835 0.00867772 0.00864005 0.00860858 0.00866485 0.00873733 0.00887012 0.00874567] mean value: 0.00868210792541504 key: test_mcc value: [0.62254302 0.3363364 0.51904762 0.1702129 0.44932255 0.44932255 0.17703552 0.25123412 0.51904762 0.37799476] mean value: 0.387209704925739 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.66666667 0.75862069 0.5862069 0.72413793 0.72413793 0.5862069 0.62068966 0.75862069 0.68965517] mean value: 0.6914942528735633 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76923077 0.6875 0.75862069 0.5 0.69230769 0.69230769 0.57142857 0.59259259 0.75862069 0.70967742] mean value: 0.6732286116532501 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.64705882 0.73333333 0.6 0.75 0.75 0.61538462 0.66666667 0.78571429 0.6875 ] mean value: 0.7144748633719222 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.73333333 0.78571429 0.42857143 0.64285714 0.64285714 0.53333333 0.53333333 0.73333333 0.73333333] mean value: 0.6433333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8 0.66666667 0.75952381 0.58095238 0.72142857 0.72142857 0.58809524 0.62380952 0.75952381 0.68809524] mean value: 0.690952380952381 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.625 0.52380952 0.61111111 0.33333333 0.52941176 0.52941176 0.4 0.42105263 0.61111111 0.55 ] mean value: 0.5134241240355791 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.29 Accuracy on Blind test: 0.58 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.35667634 1.3484025 1.35595965 1.36112309 1.36566496 1.35094857 1.36121631 1.35161304 1.35443139 1.35554624] mean value: 1.3561582088470459 key: score_time value: [0.089463 0.08975649 0.08854628 0.09363389 0.08926678 0.09072804 0.08889103 0.09302664 0.08930659 0.08826613] mean value: 0.09008848667144775 key: test_mcc value: [0.86666667 0.73994007 0.87082337 0.87082337 0.93333333 0.93333333 0.87082337 0.72954522 0.80917359 0.86190476] mean value: 0.8486367077732017 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.86666667 0.93103448 0.93103448 0.96551724 0.96551724 0.93103448 0.86206897 0.89655172 0.93103448] mean value: 0.9213793103448276 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.875 0.93333333 0.93333333 0.96551724 0.96551724 0.92857143 0.875 0.90909091 0.93333333] mean value: 0.9252030153754291 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93333333 0.82352941 0.875 0.875 0.93333333 0.93333333 1. 0.82352941 0.83333333 0.93333333] mean value: 0.8963725490196078 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.93333333 0.93333333 1. 1. 1. 1. 0.86666667 0.93333333 1. 0.93333333] mean value: 0.96 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93333333 0.86666667 0.93333333 0.93333333 0.96666667 0.96666667 0.93333333 0.85952381 0.89285714 0.93095238] mean value: 0.9216666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.77777778 0.875 0.875 0.93333333 0.93333333 0.86666667 0.77777778 0.83333333 0.875 ] mean value: 0.8622222222222222 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.87463975 0.91463089 0.88259697 0.87200809 0.91593957 0.88284707 1.00016236 1.01180363 0.92674494 0.95000696] mean value: 0.9231380224227905 key: score_time value: [0.19464636 0.18248391 0.18350911 0.25170422 0.19830871 0.21641397 0.23606873 0.16629505 0.22552013 0.22263598] mean value: 0.20775861740112306 key: test_mcc value: [0.93541435 0.80178373 0.87082337 0.87082337 0.93333333 0.93333333 0.93302503 0.72954522 0.86965655 0.86190476] mean value: 0.8739643038807166 key: train_mcc value: [0.96253342 0.9699179 0.97002678 0.9553594 0.96266749 0.96266749 0.96267809 0.96267809 0.96267809 0.95537456] mean value: 0.9626581323585782 key: test_accuracy value: [0.96666667 0.9 0.93103448 0.93103448 0.96551724 0.96551724 0.96551724 0.86206897 0.93103448 0.93103448] mean value: 0.9349425287356322 key: train_accuracy value: [0.98091603 0.98473282 0.98479087 0.97718631 0.98098859 0.98098859 0.98098859 0.98098859 0.98098859 0.97718631] mean value: 0.9809755318840159 key: test_fscore value: [0.96774194 0.90322581 0.93333333 0.93333333 0.96551724 0.96551724 0.96774194 0.875 0.9375 0.93333333] mean value: 0.9382244160177976 key: train_fscore value: [0.98127341 0.98496241 0.98507463 0.97777778 0.98141264 0.98141264 0.98127341 0.98127341 0.98127341 0.97761194] mean value: 0.9813345662726205 key: test_precision value: [0.9375 0.875 0.875 0.875 0.93333333 0.93333333 0.9375 0.82352941 0.88235294 0.93333333] mean value: 0.9005882352941177 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( train_precision value: [0.96323529 0.97037037 0.97058824 0.95652174 0.96350365 0.96350365 0.96323529 0.96323529 0.96323529 0.95620438] mean value: 0.9633633200097628 key: test_recall value: [1. 0.93333333 1. 1. 1. 1. 1. 0.93333333 1. 0.93333333] mean value: 0.98 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.9 0.93333333 0.93333333 0.96666667 0.96666667 0.96428571 0.85952381 0.92857143 0.93095238] mean value: 0.935 key: train_roc_auc value: [0.98091603 0.98473282 0.98473282 0.97709924 0.98091603 0.98091603 0.98106061 0.98106061 0.98106061 0.97727273] mean value: 0.9809767522553783 key: test_jcc value: [0.9375 0.82352941 0.875 0.875 0.93333333 0.93333333 0.9375 0.77777778 0.88235294 0.875 ] mean value: 0.8850326797385621 key: train_jcc value: [0.96323529 0.97037037 0.97058824 0.95652174 0.96350365 0.96350365 0.96323529 0.96323529 0.96323529 0.95620438] mean value: 0.9633633200097628 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0245676 0.01031852 0.01025105 0.01026773 0.00969219 0.01044655 0.01050472 0.01041341 0.01058054 0.01019526] mean value: 0.011723756790161133 key: score_time value: [0.01320982 0.00926924 0.00959635 0.00879884 0.0094924 0.0095644 0.00961423 0.00954556 0.00919628 0.00949216] mean value: 0.00977792739868164 key: test_mcc value: [0.6 0.47087096 0.6555099 0.51675233 0.51904762 0.51904762 0.47079191 0.37799476 0.58943389 0.59330823] mean value: 0.531275719579118 key: train_mcc value: [0.6261184 0.69498055 0.62787882 0.68095573 0.65781864 0.65781864 0.59892463 0.68823734 0.63502806 0.65024005] mean value: 0.6518000864251088 key: test_accuracy value: [0.8 0.73333333 0.82758621 0.75862069 0.75862069 0.75862069 0.72413793 0.68965517 0.79310345 0.79310345] mean value: 0.7636781609195402 key: train_accuracy value: [0.8129771 0.84732824 0.81368821 0.84030418 0.82889734 0.82889734 0.79847909 0.84410646 0.81749049 0.82509506] mean value: 0.8257263518416393 key: test_fscore value: [0.8 0.71428571 0.81481481 0.74074074 0.75862069 0.75862069 0.69230769 0.70967742 0.8125 0.78571429] mean value: 0.7587282046528431 key: train_fscore value: [0.81081081 0.84962406 0.81081081 0.83846154 0.82889734 0.82889734 0.78884462 0.84410646 0.81538462 0.82307692] mean value: 0.823891452089343 key: test_precision value: [0.8 0.76923077 0.84615385 0.76923077 0.73333333 0.73333333 0.81818182 0.6875 0.76470588 0.84615385] mean value: 0.7767823597970657 key: train_precision value: [0.8203125 0.83703704 0.82677165 0.8515625 0.83206107 0.83206107 0.825 0.84090909 0.82170543 0.82945736] mean value: 0.831687770959169 key: test_recall value: [0.8 0.66666667 0.78571429 0.71428571 0.78571429 0.78571429 0.6 0.73333333 0.86666667 0.73333333] mean value: 0.7471428571428571 key: train_recall value: [0.80152672 0.86259542 0.79545455 0.82575758 0.82575758 0.82575758 0.75572519 0.84732824 0.80916031 0.81679389] mean value: 0.816585704371964 key: test_roc_auc value: [0.8 0.73333333 0.82619048 0.75714286 0.75952381 0.75952381 0.72857143 0.68809524 0.79047619 0.7952381 ] mean value: 0.7638095238095238 key: train_roc_auc value: [0.8129771 0.84732824 0.81375781 0.8403597 0.82890932 0.82890932 0.79831714 0.84411867 0.81745894 0.82506361] mean value: 0.8257199861207495 key: test_jcc value: [0.66666667 0.55555556 0.6875 0.58823529 0.61111111 0.61111111 0.52941176 0.55 0.68421053 0.64705882] mean value: 0.6130860853113176 key: train_jcc value: [0.68181818 0.73856209 0.68181818 0.7218543 0.70779221 0.70779221 0.65131579 0.73026316 0.68831169 0.69934641] mean value: 0.7008874216268676 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.23225641 0.0465076 0.05317879 0.05914593 0.05356336 0.05773616 0.07073283 0.05968189 0.07531691 0.04838347] mean value: 0.07565033435821533 key: score_time value: [0.01092958 0.01095676 0.01085782 0.01044011 0.01059103 0.01085639 0.01082778 0.01170135 0.01140642 0.01048422] mean value: 0.010905146598815918 key: test_mcc value: [0.93541435 0.87447463 0.93333333 1. 0.93333333 1. 0.93333333 0.93302503 0.93302503 0.93333333] mean value: 0.9409272380452472 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96666667 0.93333333 0.96551724 1. 0.96551724 1. 0.96551724 0.96551724 0.96551724 0.96551724] mean value: 0.9693103448275863 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96551724 0.9375 0.96551724 1. 0.96551724 1. 0.96551724 0.96774194 0.96774194 0.96551724] mean value: 0.9700570077864294 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88235294 0.93333333 1. 0.93333333 1. 1. 0.9375 0.9375 1. ] mean value: 0.9624019607843137 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.93333333 1. 1. 1. 1. 1. 0.93333333 1. 1. 0.93333333] mean value: 0.98 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.93333333 0.96666667 1. 0.96666667 1. 0.96666667 0.96428571 0.96428571 0.96666667] mean value: 0.9695238095238096 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93333333 0.88235294 0.93333333 1. 0.93333333 1. 0.93333333 0.9375 0.9375 0.93333333] mean value: 0.9424019607843137 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04960895 0.06460238 0.06222391 0.05709147 0.06891847 0.05664587 0.06179833 0.0570817 0.0564754 0.05665326] mean value: 0.059109973907470706 key: score_time value: [0.02435279 0.02223825 0.02524686 0.02369213 0.02095008 0.02401018 0.02549863 0.02384377 0.02049375 0.0244801 ] mean value: 0.023480653762817383 key: test_mcc value: [0.86666667 0.73994007 0.7952381 0.93333333 0.6555099 0.59330823 0.45455066 0.7952381 0.65714286 0.72380952] mean value: 0.7214737425055385 key: train_mcc value: [0.97735555 0.99239533 0.98490371 0.98490371 0.98490371 0.96969173 0.98490544 0.98490544 0.97744232 0.97744232] mean value: 0.9818849266007134 key: test_accuracy value: [0.93333333 0.86666667 0.89655172 0.96551724 0.82758621 0.79310345 0.72413793 0.89655172 0.82758621 0.86206897] mean value: 0.8593103448275862 key: train_accuracy value: [0.98854962 0.99618321 0.99239544 0.99239544 0.99239544 0.98479087 0.99239544 0.99239544 0.98859316 0.98859316] mean value: 0.9908687197051056 key: test_fscore value: [0.93333333 0.875 0.89655172 0.96551724 0.81481481 0.8 0.71428571 0.89655172 0.82758621 0.86666667] mean value: 0.8590307425652253 key: train_fscore value: [0.98867925 0.99619772 0.9924812 0.9924812 0.9924812 0.98496241 0.99242424 0.99242424 0.98867925 0.98867925] mean value: 0.9909489954366314 key: test_precision value: [0.93333333 0.82352941 0.86666667 0.93333333 0.84615385 0.75 0.76923077 0.92857143 0.85714286 0.86666667] mean value: 0.8574628312863607 key: train_precision value: [0.97761194 0.99242424 0.98507463 0.98507463 0.98507463 0.97761194 0.98496241 0.98496241 0.97761194 0.97761194] mean value: 0.9828020696245362 key: test_recall value: [0.93333333 0.93333333 0.92857143 1. 0.78571429 0.85714286 0.66666667 0.86666667 0.8 0.86666667] mean value: 0.8638095238095238 key: train_recall value: [1. 1. 1. 1. 1. 0.99242424 1. 1. 1. 1. ] mean value: 0.9992424242424243 key: test_roc_auc value: [0.93333333 0.86666667 0.89761905 0.96666667 0.82619048 0.7952381 0.72619048 0.89761905 0.82857143 0.86190476] mean value: 0.86 key: train_roc_auc value: [0.98854962 0.99618321 0.99236641 0.99236641 0.99236641 0.98476174 0.99242424 0.99242424 0.98863636 0.98863636] mean value: 0.9908715012722646 key: test_jcc value: [0.875 0.77777778 0.8125 0.93333333 0.6875 0.66666667 0.55555556 0.8125 0.70588235 0.76470588] mean value: 0.7591421568627451 key: train_jcc value: [0.97761194 0.99242424 0.98507463 0.98507463 0.98507463 0.97037037 0.98496241 0.98496241 0.97761194 0.97761194] mean value: 0.9820779126317225 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01477504 0.00951862 0.00931025 0.00919795 0.00936079 0.00953674 0.01026082 0.01047182 0.00936103 0.00997782] mean value: 0.010177087783813477 key: score_time value: [0.00942469 0.00893283 0.00876451 0.00868773 0.00890875 0.00949097 0.00881505 0.00911379 0.00960183 0.00871181] mean value: 0.00904519557952881 key: test_mcc value: [0.81649658 0.60540551 0.59330823 0.67156812 0.7320658 0.44188962 0.59330823 0.51675233 0.72954522 0.51675233] mean value: 0.621709195862531 key: train_mcc value: [0.65259237 0.64668979 0.67853804 0.68974549 0.67619108 0.65131111 0.69883248 0.68267177 0.6481901 0.63570088] mean value: 0.6660463116903754 key: test_accuracy value: [0.9 0.8 0.79310345 0.82758621 0.86206897 0.68965517 0.79310345 0.75862069 0.86206897 0.75862069] mean value: 0.8044827586206896 key: train_accuracy value: [0.82442748 0.82061069 0.8365019 0.84410646 0.8365019 0.82509506 0.84790875 0.84030418 0.82129278 0.81749049] mean value: 0.8314239688851479 key: test_fscore value: [0.90909091 0.78571429 0.8 0.83870968 0.86666667 0.74285714 0.78571429 0.77419355 0.875 0.77419355] mean value: 0.8152140064236838 key: train_fscore value: [0.83333333 0.83154122 0.84697509 0.84981685 0.84476534 0.83088235 0.8540146 0.84558824 0.83154122 0.82089552] mean value: 0.8389353761517929 key: test_precision value: [0.83333333 0.84615385 0.75 0.76470588 0.8125 0.61904762 0.84615385 0.75 0.82352941 0.75 ] mean value: 0.7795423938806292 key: train_precision value: [0.79310345 0.78378378 0.79865772 0.82269504 0.80689655 0.80714286 0.81818182 0.81560284 0.78378378 0.80291971] mean value: 0.803276754138267 key: test_recall value: [1. 0.73333333 0.85714286 0.92857143 0.92857143 0.92857143 0.73333333 0.8 0.93333333 0.8 ] mean value: 0.8642857142857143 key: train_recall value: [0.8778626 0.88549618 0.90151515 0.87878788 0.88636364 0.85606061 0.89312977 0.8778626 0.88549618 0.83969466] mean value: 0.8782269257460097 key: test_roc_auc value: [0.9 0.8 0.7952381 0.83095238 0.86428571 0.69761905 0.7952381 0.75714286 0.85952381 0.75714286] mean value: 0.8057142857142857 key: train_roc_auc value: [0.82442748 0.82061069 0.83625376 0.84397409 0.83631159 0.82497687 0.84808004 0.84044645 0.82153597 0.8175746 ] mean value: 0.8314191533657183 key: test_jcc value: [0.83333333 0.64705882 0.66666667 0.72222222 0.76470588 0.59090909 0.64705882 0.63157895 0.77777778 0.63157895] mean value: 0.6912890515057698 key: train_jcc value: [0.71428571 0.71165644 0.7345679 0.7388535 0.73125 0.71069182 0.74522293 0.73248408 0.71165644 0.69620253] mean value: 0.7226871364054945 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01238084 0.02012634 0.0197072 0.01957273 0.01937509 0.02072191 0.02095866 0.01990652 0.01836991 0.01708722] mean value: 0.018820643424987793 key: score_time value: [0.0095036 0.01113105 0.01119828 0.01179147 0.01183867 0.01185751 0.01195765 0.01185346 0.01179409 0.01182723] mean value: 0.011475300788879395 key: test_mcc value: [0.76088591 0.60540551 0.67156812 0.86965655 0.87082337 0.72954522 0.7952381 0.65714286 0.79426746 0.86965655] mean value: 0.7624189650461067 key: train_mcc value: [0.82856162 0.99239533 0.97721358 0.95447974 0.84450885 0.94745994 0.9337642 0.93286731 0.97744232 0.91766613] mean value: 0.9306359023287414 key: test_accuracy value: [0.86666667 0.8 0.82758621 0.93103448 0.93103448 0.86206897 0.89655172 0.82758621 0.89655172 0.93103448] mean value: 0.8770114942528735 key: train_accuracy value: [0.90839695 0.99618321 0.98859316 0.97718631 0.91634981 0.97338403 0.96577947 0.96577947 0.98859316 0.9581749 ] mean value: 0.9638420456854265 key: test_fscore value: [0.88235294 0.78571429 0.83870968 0.92307692 0.93333333 0.84615385 0.89655172 0.82758621 0.90322581 0.9375 ] mean value: 0.8774204744360309 key: train_fscore value: [0.91549296 0.99619772 0.98867925 0.97744361 0.92307692 0.97297297 0.96678967 0.96470588 0.98867925 0.95910781] mean value: 0.9653146028957218 key: test_precision value: [0.78947368 0.84615385 0.76470588 1. 0.875 0.91666667 0.92857143 0.85714286 0.875 0.88235294] mean value: 0.8735067306274736 key: train_precision value: [0.8496732 0.99242424 0.98496241 0.97014925 0.85714286 0.99212598 0.93571429 0.99193548 0.97761194 0.93478261] mean value: 0.9486522264759241 key: test_recall value: [1. 0.73333333 0.92857143 0.85714286 1. 0.78571429 0.86666667 0.8 0.93333333 1. ] mean value: 0.8904761904761904 key: train_recall value: [0.99236641 1. 0.99242424 0.98484848 1. 0.95454545 1. 0.9389313 1. 0.98473282] mean value: 0.9847848716169327 key: test_roc_auc value: [0.86666667 0.8 0.83095238 0.92857143 0.93333333 0.85952381 0.89761905 0.82857143 0.8952381 0.92857143] mean value: 0.876904761904762 key: train_roc_auc value: [0.90839695 0.99618321 0.98857853 0.97715707 0.91603053 0.97345593 0.96590909 0.96567777 0.98863636 0.9582755 ] mean value: 0.9638300948415452 key: test_jcc value: [0.78947368 0.64705882 0.72222222 0.85714286 0.875 0.73333333 0.8125 0.70588235 0.82352941 0.88235294] mean value: 0.7848495626320704 key: train_jcc value: [0.84415584 0.99242424 0.97761194 0.95588235 0.85714286 0.94736842 0.93571429 0.93181818 0.97761194 0.92142857] mean value: 0.9341158637274806 MCC on Blind test: 0.67 Accuracy on Blind test: 0.81 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0153439 0.01487064 0.01535463 0.01554227 0.01502013 0.01618838 0.01572561 0.01596141 0.01561093 0.0169456 ] mean value: 0.015656352043151855 key: score_time value: [0.00961018 0.01186347 0.01176333 0.01177406 0.01178432 0.01180696 0.01184416 0.01181364 0.01184845 0.01183987] mean value: 0.011594843864440919 key: test_mcc value: [1. 0.6681531 0.59628479 0.86190476 0.86190476 0.79426746 0.72954522 0.6130103 0.7952381 0.86965655] mean value: 0.7789965054376262 key: train_mcc value: [0.90935126 0.92636711 0.90177727 0.95447974 0.90885432 0.96222382 0.85796431 0.8003837 0.68576928 0.96223033] mean value: 0.8869401136471388 key: test_accuracy value: [1. 0.83333333 0.75862069 0.93103448 0.93103448 0.89655172 0.86206897 0.79310345 0.89655172 0.93103448] mean value: 0.8833333333333333 key: train_accuracy value: [0.95419847 0.96183206 0.95057034 0.97718631 0.95437262 0.98098859 0.92395437 0.89353612 0.82509506 0.98098859] mean value: 0.9402722549560271 key: test_fscore value: [1. 0.83870968 0.8 0.92857143 0.92857143 0.88888889 0.875 0.76923077 0.89655172 0.9375 ] mean value: 0.8863023916819801 key: train_fscore value: [0.953125 0.96323529 0.95167286 0.97744361 0.95419847 0.98127341 0.92907801 0.88235294 0.79090909 0.98113208] mean value: 0.9364420768857535 key: test_precision value: [1. 0.8125 0.66666667 0.92857143 0.92857143 0.92307692 0.82352941 0.90909091 0.92857143 0.88235294] mean value: 0.8802931137489961 key: train_precision value: [0.976 0.92907801 0.93430657 0.97014925 0.96153846 0.97037037 0.86754967 0.98130841 0.97752809 0.97014925] mean value: 0.9537978092875747 key: test_recall value: [1. 0.86666667 1. 0.92857143 0.92857143 0.85714286 0.93333333 0.66666667 0.86666667 1. ] mean value: 0.9047619047619048 key: train_recall value: [0.93129771 1. 0.96969697 0.98484848 0.9469697 0.99242424 1. 0.80152672 0.66412214 0.99236641] mean value: 0.928325237103863 key: test_roc_auc value: [1. 0.83333333 0.76666667 0.93095238 0.93095238 0.8952381 0.85952381 0.79761905 0.89761905 0.92857143] mean value: 0.8840476190476191 key: train_roc_auc value: [0.95419847 0.96183206 0.95049734 0.97715707 0.95440088 0.98094495 0.92424242 0.8931876 0.82448531 0.98103169] mean value: 0.9401977793199168 key: test_jcc value: [1. 0.72222222 0.66666667 0.86666667 0.86666667 0.8 0.77777778 0.625 0.8125 0.88235294] mean value: 0.8019852941176471 key: train_jcc value: [0.91044776 0.92907801 0.90780142 0.95588235 0.91240876 0.96323529 0.86754967 0.78947368 0.65413534 0.96296296] mean value: 0.885297525439458 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.13885784 0.13189292 0.13409328 0.13466954 0.1360693 0.13970566 0.13426328 0.13068724 0.13306546 0.13250685] mean value: 0.1345811367034912 key: score_time value: [0.01640296 0.01646113 0.01672888 0.01593828 0.01671743 0.01641703 0.01637411 0.01561213 0.01619744 0.01557469] mean value: 0.016242408752441408 key: test_mcc value: [0.93541435 0.87447463 0.86190476 1. 0.87082337 0.79426746 0.93333333 0.93302503 0.80917359 0.93333333] mean value: 0.8945749865401353 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96666667 0.93333333 0.93103448 1. 0.93103448 0.89655172 0.96551724 0.96551724 0.89655172 0.96551724] mean value: 0.9451724137931035 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96551724 0.9375 0.92857143 1. 0.93333333 0.88888889 0.96551724 0.96774194 0.90909091 0.96551724] mean value: 0.9461678219506362 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88235294 0.92857143 1. 0.875 0.92307692 1. 0.9375 0.83333333 1. ] mean value: 0.9379834626158156 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.93333333 1. 0.92857143 1. 1. 0.85714286 0.93333333 1. 1. 0.93333333] mean value: 0.9585714285714286 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.93333333 0.93095238 1. 0.93333333 0.8952381 0.96666667 0.96428571 0.89285714 0.96666667] mean value: 0.9450000000000001 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93333333 0.88235294 0.86666667 1. 0.875 0.8 0.93333333 0.9375 0.83333333 0.93333333] mean value: 0.8994852941176471 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.04823947 0.04498172 0.04811335 0.04562259 0.04412913 0.0470643 0.05953717 0.04916573 0.05571675 0.05847645] mean value: 0.05010466575622559 key: score_time value: [0.02403498 0.02468944 0.02403665 0.02459049 0.03906131 0.03024817 0.0207746 0.03200412 0.03876305 0.02441049] mean value: 0.028261327743530275 key: test_mcc value: [0.93541435 0.68041382 0.86190476 0.93302503 0.93333333 0.86965655 1. 0.93302503 0.93302503 0.93333333] mean value: 0.9013131248529029 key: train_mcc value: [0.98473282 1. 1. 0.98479065 0.9772149 0.98490544 0.98490371 0.96200555 0.95448499 0.98490544] mean value: 0.9817943520005022 key: test_accuracy value: [0.96666667 0.83333333 0.93103448 0.96551724 0.96551724 0.93103448 1. 0.96551724 0.96551724 0.96551724] mean value: 0.9489655172413793 key: train_accuracy value: [0.99236641 1. 1. 0.99239544 0.98859316 0.99239544 0.99239544 0.98098859 0.97718631 0.99239544] mean value: 0.9908716222099672 key: test_fscore value: [0.96551724 0.84848485 0.92857143 0.96296296 0.96551724 0.92307692 1. 0.96774194 0.96774194 0.96551724] mean value: 0.9495131758201836 key: train_fscore value: [0.99236641 1. 1. 0.99242424 0.98859316 0.99236641 0.99230769 0.98098859 0.97727273 0.99242424] mean value: 0.9908743477905815 key: test_precision value: [1. 0.77777778 0.92857143 1. 0.93333333 1. 1. 0.9375 0.9375 1. ] mean value: 0.951468253968254 key: train_precision value: [0.99236641 1. 1. 0.99242424 0.99236641 1. 1. 0.97727273 0.96992481 0.98496241] mean value: 0.9909317012169563 key: test_recall value: [0.93333333 0.93333333 0.92857143 0.92857143 1. 0.85714286 1. 1. 1. 0.93333333] mean value: 0.9514285714285714 key: train_recall value: [0.99236641 1. 1. 0.99242424 0.98484848 0.98484848 0.98473282 0.98473282 0.98473282 1. ] mean value: 0.9908686097617395 key: test_roc_auc value: [0.96666667 0.83333333 0.93095238 0.96428571 0.96666667 0.92857143 1. 0.96428571 0.96428571 0.96666667] mean value: 0.9485714285714286 key: train_roc_auc value: [0.99236641 1. 1. 0.99239533 0.98860745 0.99242424 0.99236641 0.98100278 0.9772149 0.99242424] mean value: 0.9908801758038399 key: test_jcc value: [0.93333333 0.73684211 0.86666667 0.92857143 0.93333333 0.85714286 1. 0.9375 0.9375 0.93333333] mean value: 0.9064223057644111 key: train_jcc value: [0.98484848 1. 1. 0.98496241 0.97744361 0.98484848 0.98473282 0.96268657 0.95555556 0.98496241] mean value: 0.9820040337896817 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.08211589 0.08095455 0.06087279 0.03729415 0.048522 0.07543254 0.06906867 0.04290318 0.03936195 0.07277274] mean value: 0.06092984676361084 key: score_time value: [0.0236876 0.02488613 0.01395512 0.01358104 0.02060533 0.02655029 0.01827741 0.0167532 0.02412868 0.02458143] mean value: 0.020700621604919433 key: test_mcc value: [0.46666667 0.34585723 0.24285714 0.67156812 0.59330823 0.45455066 0.34975426 0.37799476 0.44932255 0.58943389] mean value: 0.4541313498424294 key: train_mcc value: [0.98473282 0.97712771 0.98479065 0.98479065 0.98479065 0.98490371 0.96958131 0.9772149 0.9772149 0.96958131] mean value: 0.979472861832087 key: test_accuracy value: [0.73333333 0.66666667 0.62068966 0.82758621 0.79310345 0.72413793 0.65517241 0.68965517 0.72413793 0.79310345] mean value: 0.7227586206896551 key: train_accuracy value: [0.99236641 0.98854962 0.99239544 0.99239544 0.99239544 0.99239544 0.98479087 0.98859316 0.98859316 0.98479087] mean value: 0.9897265840420283 key: test_fscore value: [0.73333333 0.61538462 0.62068966 0.83870968 0.8 0.73333333 0.58333333 0.70967742 0.75 0.8125 ] mean value: 0.7196961367331223 key: train_fscore value: [0.99236641 0.98859316 0.99242424 0.99242424 0.99242424 0.9924812 0.98473282 0.98859316 0.98859316 0.98473282] mean value: 0.9897365459029557 key: test_precision value: [0.73333333 0.72727273 0.6 0.76470588 0.75 0.6875 0.77777778 0.6875 0.70588235 0.76470588] mean value: 0.7198677956030897 key: train_precision value: [0.99236641 0.98484848 0.99242424 0.99242424 0.99242424 0.98507463 0.98473282 0.98484848 0.98484848 0.98473282] mean value: 0.9878724869752555 key: test_recall value: [0.73333333 0.53333333 0.64285714 0.92857143 0.85714286 0.78571429 0.46666667 0.73333333 0.8 0.86666667] mean value: 0.7347619047619047 key: train_recall value: [0.99236641 0.99236641 0.99242424 0.99242424 0.99242424 1. 0.98473282 0.99236641 0.99236641 0.98473282] mean value: 0.9916204024982651 key: test_roc_auc value: [0.73333333 0.66666667 0.62142857 0.83095238 0.7952381 0.72619048 0.66190476 0.68809524 0.72142857 0.79047619] mean value: 0.7235714285714286 key: train_roc_auc value: [0.99236641 0.98854962 0.99239533 0.99239533 0.99239533 0.99236641 0.98479065 0.98860745 0.98860745 0.98479065] mean value: 0.9897264631043257 key: test_jcc value: [0.57894737 0.44444444 0.45 0.72222222 0.66666667 0.57894737 0.41176471 0.55 0.6 0.68421053] mean value: 0.5687203302373581 key: train_jcc value: [0.98484848 0.97744361 0.98496241 0.98496241 0.98496241 0.98507463 0.96992481 0.97744361 0.97744361 0.96992481] mean value: 0.9796990780887088 MCC on Blind test: 0.39 Accuracy on Blind test: 0.69 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.44738221 0.46150136 0.46548247 0.45794606 0.42596221 0.4200623 0.44993877 0.42784166 0.42434502 0.43154621] mean value: 0.44120082855224607 key: score_time value: [0.0106287 0.01016116 0.01001334 0.0097239 0.00935102 0.00991964 0.00951171 0.01013899 0.01051497 0.00930452] mean value: 0.009926795959472656 key: test_mcc value: [0.87447463 0.76088591 0.86190476 1. 0.93333333 1. 1. 0.86965655 0.86965655 0.93333333] mean value: 0.9103245077976663 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.86666667 0.93103448 1. 0.96551724 1. 1. 0.93103448 0.93103448 0.96551724] mean value: 0.9524137931034483 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92857143 0.88235294 0.92857143 1. 0.96551724 1. 1. 0.9375 0.9375 0.96551724] mean value: 0.9545530281077949 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.78947368 0.92857143 1. 0.93333333 1. 1. 0.88235294 0.88235294 1. ] mean value: 0.9416084328468229 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86666667 1. 0.92857143 1. 1. 1. 1. 1. 1. 0.93333333] mean value: 0.9728571428571429 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93333333 0.86666667 0.93095238 1. 0.96666667 1. 1. 0.92857143 0.92857143 0.96666667] mean value: 0.9521428571428572 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86666667 0.78947368 0.86666667 1. 0.93333333 1. 1. 0.88235294 0.88235294 0.93333333] mean value: 0.9154179566563467 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02211094 0.03654742 0.03356147 0.04863787 0.05131888 0.02375007 0.02474117 0.02334833 0.02405477 0.0240171 ] mean value: 0.03120880126953125 key: score_time value: [0.01956391 0.01965404 0.01273465 0.0126214 0.0166223 0.01554108 0.01730466 0.01577425 0.01530719 0.0165956 ] mean value: 0.016171908378601073 key: test_mcc value: [0.26726124 0.15075567 0.38368877 0.26533187 0.28749445 0.17703552 0.38095238 0.02898855 0.24688536 0.46057608] mean value: 0.26489698985225885 key: train_mcc value: [0.87085548 0.84404875 0.97743845 0.81830918 0.81183809 0.83790995 0.71853328 0.89179531 0.65867038 0.96267809] mean value: 0.8392076957241592 key: test_accuracy value: [0.63333333 0.56666667 0.65517241 0.62068966 0.62068966 0.5862069 0.68965517 0.51724138 0.62068966 0.72413793] mean value: 0.623448275862069 key: train_accuracy value: [0.93129771 0.91603053 0.98859316 0.90114068 0.8973384 0.91254753 0.84030418 0.94296578 0.80228137 0.98098859] mean value: 0.91134879400923 key: test_fscore value: [0.64516129 0.64864865 0.72222222 0.66666667 0.68571429 0.6 0.68965517 0.5625 0.68571429 0.76470588] mean value: 0.6670988454055424 key: train_fscore value: [0.93571429 0.92253521 0.98876404 0.91034483 0.90721649 0.91986063 0.86184211 0.94584838 0.8343949 0.98127341] mean value: 0.92077942849477 key: test_precision value: [0.625 0.54545455 0.59090909 0.57894737 0.57142857 0.5625 0.71428571 0.52941176 0.6 0.68421053] mean value: 0.6002147581520647 key: train_precision value: [0.87919463 0.85620915 0.97777778 0.83544304 0.83018868 0.8516129 0.75722543 0.89726027 0.71584699 0.96323529] mean value: 0.8563994175574612 key: test_recall value: [0.66666667 0.8 0.92857143 0.78571429 0.85714286 0.64285714 0.66666667 0.6 0.8 0.86666667] mean value: 0.7614285714285715 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63333333 0.56666667 0.66428571 0.62619048 0.62857143 0.58809524 0.69047619 0.51428571 0.61428571 0.71904762] mean value: 0.6245238095238095 key: train_roc_auc value: [0.93129771 0.91603053 0.98854962 0.90076336 0.89694656 0.91221374 0.84090909 0.94318182 0.8030303 0.98106061] mean value: 0.9113983344899376 key: test_jcc value: [0.47619048 0.48 0.56521739 0.5 0.52173913 0.42857143 0.52631579 0.39130435 0.52173913 0.61904762] mean value: 0.5030125313283208 key: train_jcc value: [0.87919463 0.85620915 0.97777778 0.83544304 0.83018868 0.8516129 0.75722543 0.89726027 0.71584699 0.96323529] mean value: 0.8563994175574612 MCC on Blind test: 0.14 Accuracy on Blind test: 0.64 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01498985 0.01466179 0.03298283 0.03653455 0.01458478 0.01477385 0.01454568 0.01460361 0.03366661 0.03188801] mean value: 0.02232315540313721 key: score_time value: [0.01229572 0.01209068 0.01755023 0.02199912 0.01208687 0.01216507 0.01210165 0.01222658 0.02346396 0.02577353] mean value: 0.016175341606140137 key: test_mcc value: [0.93541435 0.73994007 0.86190476 1. 0.93333333 0.65714286 0.81167945 0.7952381 0.79426746 0.93302503] mean value: 0.8461945416676242 key: train_mcc value: [0.94791916 0.96253342 0.9553594 0.94810134 0.9553594 0.9553594 0.95537456 0.95537456 0.95537456 0.94812183] mean value: 0.9538877616323723 key: test_accuracy value: [0.96666667 0.86666667 0.93103448 1. 0.96551724 0.82758621 0.89655172 0.89655172 0.89655172 0.96551724] mean value: 0.921264367816092 key: train_accuracy value: [0.97328244 0.98091603 0.97718631 0.97338403 0.97718631 0.97718631 0.97718631 0.97718631 0.97718631 0.97338403] mean value: 0.9764084404841378 key: test_fscore value: [0.96774194 0.875 0.92857143 1. 0.96551724 0.82758621 0.88888889 0.89655172 0.90322581 0.96774194] mean value: 0.9220825167293466 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:168: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:171: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.9739777 0.98127341 0.97777778 0.97416974 0.97777778 0.97777778 0.97761194 0.97761194 0.97761194 0.9739777 ] mean value: 0.9769567694500546 key: test_precision value: [0.9375 0.82352941 0.92857143 1. 0.93333333 0.8 1. 0.92857143 0.875 0.9375 ] mean value: 0.9164005602240897 key: train_precision value: [0.94927536 0.96323529 0.95652174 0.94964029 0.95652174 0.95652174 0.95620438 0.95620438 0.95620438 0.94927536] mean value: 0.9549604662602549 key: test_recall value: [1. 0.93333333 0.92857143 1. 1. 0.85714286 0.8 0.86666667 0.93333333 1. ] mean value: 0.9319047619047619 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.86666667 0.93095238 1. 0.96666667 0.82857143 0.9 0.89761905 0.8952381 0.96428571] mean value: 0.9216666666666667 key: train_roc_auc value: [0.97328244 0.98091603 0.97709924 0.97328244 0.97709924 0.97709924 0.97727273 0.97727273 0.97727273 0.97348485] mean value: 0.9764081656257229 key: test_jcc value: [0.9375 0.77777778 0.86666667 1. 0.93333333 0.70588235 0.8 0.8125 0.82352941 0.9375 ] mean value: 0.859468954248366 key: train_jcc value: [0.94927536 0.96323529 0.95652174 0.94964029 0.95652174 0.95652174 0.95620438 0.95620438 0.95620438 0.94927536] mean value: 0.9549604662602549 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.128407 0.19808292 0.22386646 0.24572563 0.35035658 0.2539382 0.24668813 0.2884469 0.25687814 0.14638782] mean value: 0.2338777780532837 key: score_time value: [0.01221538 0.01226711 0.02225232 0.02302647 0.02310872 0.02319145 0.02282047 0.02049422 0.02153277 0.01216483] mean value: 0.019307374954223633 key: test_mcc value: [0.93541435 0.73994007 0.86190476 1. 0.93333333 0.65714286 0.81167945 0.7952381 0.79426746 0.93302503] mean value: 0.8461945416676242 key: train_mcc value: [0.94791916 0.96253342 0.9553594 0.94810134 0.9553594 0.9553594 0.95537456 0.95537456 0.95537456 0.94812183] mean value: 0.9538877616323723 key: test_accuracy value: [0.96666667 0.86666667 0.93103448 1. 0.96551724 0.82758621 0.89655172 0.89655172 0.89655172 0.96551724] mean value: 0.921264367816092 key: train_accuracy value: [0.97328244 0.98091603 0.97718631 0.97338403 0.97718631 0.97718631 0.97718631 0.97718631 0.97718631 0.97338403] mean value: 0.9764084404841378 key: test_fscore value: [0.96774194 0.875 0.92857143 1. 0.96551724 0.82758621 0.88888889 0.89655172 0.90322581 0.96774194] mean value: 0.9220825167293466 key: train_fscore value: [0.9739777 0.98127341 0.97777778 0.97416974 0.97777778 0.97777778 0.97761194 0.97761194 0.97761194 0.9739777 ] mean value: 0.9769567694500546 key: test_precision value: [0.9375 0.82352941 0.92857143 1. 0.93333333 0.8 1. 0.92857143 0.875 0.9375 ] mean value: 0.9164005602240897 key: train_precision value: [0.94927536 0.96323529 0.95652174 0.94964029 0.95652174 0.95652174 0.95620438 0.95620438 0.95620438 0.94927536] mean value: 0.9549604662602549 key: test_recall value: [1. 0.93333333 0.92857143 1. 1. 0.85714286 0.8 0.86666667 0.93333333 1. ] mean value: 0.9319047619047619 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96666667 0.86666667 0.93095238 1. 0.96666667 0.82857143 0.9 0.89761905 0.8952381 0.96428571] mean value: 0.9216666666666667 key: train_roc_auc value: [0.97328244 0.98091603 0.97709924 0.97328244 0.97709924 0.97709924 0.97727273 0.97727273 0.97727273 0.97348485] mean value: 0.9764081656257229 key: test_jcc value: [0.9375 0.77777778 0.86666667 1. 0.93333333 0.70588235 0.8 0.8125 0.82352941 0.9375 ] mean value: 0.859468954248366 key: train_jcc value: [0.94927536 0.96323529 0.95652174 0.94964029 0.95652174 0.95652174 0.95620438 0.95620438 0.95620438 0.94927536] mean value: 0.9549604662602549 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02885795 0.04506516 0.03932476 0.03861547 0.06617665 0.04848981 0.07225323 0.03943682 0.03743815 0.03896904] mean value: 0.04546270370483398 key: score_time value: [0.01223993 0.02304626 0.01439619 0.01436543 0.02252102 0.01312566 0.02620292 0.0147469 0.01469874 0.01533699] mean value: 0.017068004608154295 key: test_mcc value: [0.8951918 0.86189955 0.75808552 0.82512315 0.7589669 0.75808552 0.82512315 0.85960591 0.8951918 0.85960591] mean value: 0.8296879220906809 key: train_mcc value: [0.90270158 0.92982429 0.9259873 0.88314434 0.90686795 0.90687923 0.91033728 0.90687923 0.9299395 0.93765105] mean value: 0.9140211735228061 key: test_accuracy value: [0.94736842 0.92982456 0.87719298 0.9122807 0.87719298 0.87719298 0.9122807 0.92982456 0.94736842 0.92982456] mean value: 0.9140350877192982 key: train_accuracy value: [0.95126706 0.96491228 0.96296296 0.94152047 0.95321637 0.95321637 0.95516569 0.95321637 0.96491228 0.96881092] mean value: 0.9569200779727095 key: test_fscore value: [0.94545455 0.93103448 0.86792453 0.9122807 0.88135593 0.8852459 0.9122807 0.93103448 0.94915254 0.93103448] mean value: 0.9146798301756682 key: train_fscore value: [0.95183044 0.96498054 0.96324952 0.94208494 0.95402299 0.95384615 0.95499022 0.95384615 0.96511628 0.9688716 ] mean value: 0.9572838832295703 key: test_precision value: [0.96296296 0.9 0.92 0.89655172 0.83870968 0.84375 0.92857143 0.93103448 0.93333333 0.93103448] mean value: 0.9085948091942252 key: train_precision value: [0.94274809 0.96498054 0.95769231 0.9348659 0.93962264 0.93939394 0.95686275 0.93939394 0.95769231 0.96511628] mean value: 0.9498368696583012 key: test_recall value: [0.92857143 0.96428571 0.82142857 0.92857143 0.92857143 0.93103448 0.89655172 0.93103448 0.96551724 0.93103448] mean value: 0.9226600985221675 key: train_recall value: [0.96108949 0.96498054 0.9688716 0.94941634 0.9688716 0.96875 0.953125 0.96875 0.97265625 0.97265625] mean value: 0.9649167071984436 key: test_roc_auc value: [0.94704433 0.93041872 0.87623153 0.91256158 0.87807882 0.87623153 0.91256158 0.92980296 0.94704433 0.92980296] mean value: 0.9139778325123153 key: train_roc_auc value: [0.95124787 0.96491215 0.96295142 0.94150505 0.9531858 0.9532466 0.95516172 0.9532466 0.96492735 0.9688184 ] mean value: 0.9569202942607005 key: test_jcc value: [0.89655172 0.87096774 0.76666667 0.83870968 0.78787879 0.79411765 0.83870968 0.87096774 0.90322581 0.87096774] mean value: 0.8438763212838983 key: train_jcc value: [0.90808824 0.93233083 0.92910448 0.89051095 0.91208791 0.91176471 0.91385768 0.91176471 0.93258427 0.93962264] mean value: 0.9181716401806431 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.90133214 0.98307157 0.94539571 0.88999653 1.07535005 0.92793036 1.03368664 0.91905904 0.98550344 0.9788034 ] mean value: 0.964012885093689 key: score_time value: [0.01478505 0.01508641 0.01475453 0.01491857 0.01558185 0.0156281 0.01624322 0.01675773 0.01718378 0.01678681] mean value: 0.01577260494232178 key: test_mcc value: [0.86789789 0.9321832 0.8615634 0.93202124 0.8951918 0.96551724 0.82512315 0.89988258 1. 0.85960591] mean value: 0.9038986422320657 key: train_mcc value: [0.98831147 0.9922027 1. 0.98443509 0.98831147 1. 0.98831165 1. 0.98443556 0.99610889] mean value: 0.9922116831378749 key: test_accuracy value: [0.92982456 0.96491228 0.92982456 0.96491228 0.94736842 0.98245614 0.9122807 0.94736842 1. 0.92982456] mean value: 0.9508771929824561 key: train_accuracy value: [0.99415205 0.99610136 1. 0.99220273 0.99415205 1. 0.99415205 1. 0.99220273 0.99805068] mean value: 0.9961013645224172 key: test_fscore value: [0.92307692 0.96551724 0.92592593 0.96296296 0.94545455 0.98245614 0.9122807 0.94545455 1. 0.93103448] mean value: 0.9494163469118098 key: train_fscore value: [0.99417476 0.99610895 1. 0.99224806 0.99417476 1. 0.99415205 1. 0.9922179 0.99804305] mean value: 0.9961119524448837 key: test_precision value: [1. 0.93333333 0.96153846 1. 0.96296296 1. 0.92857143 1. 1. 0.93103448] mean value: 0.9717440669164807 key: train_precision value: [0.99224806 0.99610895 1. 0.98841699 0.99224806 1. 0.9922179 1. 0.98837209 1. ] mean value: 0.9949612053720279 key: test_recall value: [0.85714286 1. 0.89285714 0.92857143 0.92857143 0.96551724 0.89655172 0.89655172 1. 0.93103448] mean value: 0.929679802955665 key: train_recall value: [0.99610895 0.99610895 1. 0.99610895 0.99610895 1. 0.99609375 1. 0.99609375 0.99609375] mean value: 0.9972717047665369 key: test_roc_auc value: [0.92857143 0.96551724 0.92918719 0.96428571 0.94704433 0.98275862 0.91256158 0.94827586 1. 0.92980296] mean value: 0.9508004926108374 key: train_roc_auc value: [0.99414822 0.99610135 1. 0.9921951 0.99414822 1. 0.99415582 1. 0.9922103 0.99804688] mean value: 0.996100589737354 key: test_jcc value: [0.85714286 0.93333333 0.86206897 0.92857143 0.89655172 0.96551724 0.83870968 0.89655172 1. 0.87096774] mean value: 0.9049414693574872 key: train_jcc value: [0.98841699 0.99224806 1. 0.98461538 0.98841699 1. 0.98837209 1. 0.98455598 0.99609375] mean value: 0.9922719251044105 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01561379 0.01096916 0.01066613 0.01034665 0.01063323 0.01133728 0.01132131 0.01226974 0.01155472 0.01094699] mean value: 0.011565899848937989 key: score_time value: [0.01313066 0.00961709 0.00921512 0.0092473 0.00938988 0.01334214 0.0097208 0.00976133 0.00919127 0.00969601] mean value: 0.01023116111755371 key: test_mcc value: [0.40394089 0.57536175 0.55091314 0.57881773 0.65104858 0.57073542 0.55091314 0.61589458 0.58358651 0.47348988] mean value: 0.5554701616909862 key: train_mcc value: [0.59070488 0.60335508 0.59951056 0.64777118 0.67228322 0.61705269 0.65940526 0.63436328 0.63208504 0.59569245] mean value: 0.625222364018471 key: test_accuracy value: [0.70175439 0.77192982 0.77192982 0.78947368 0.8245614 0.77192982 0.77192982 0.78947368 0.78947368 0.73684211] mean value: 0.7719298245614035 key: train_accuracy value: [0.79337232 0.79922027 0.79922027 0.82261209 0.83430799 0.80506823 0.82846004 0.81481481 0.8128655 0.79532164] mean value: 0.8105263157894737 key: test_fscore value: [0.70175439 0.8 0.74509804 0.78571429 0.82758621 0.80597015 0.79365079 0.82352941 0.80645161 0.74576271] mean value: 0.78355175972283 key: train_fscore value: [0.80514706 0.81170018 0.79358717 0.83054004 0.84288355 0.81818182 0.83520599 0.82504604 0.82481752 0.80733945] mean value: 0.819444882121119 key: test_precision value: [0.68965517 0.7027027 0.82608696 0.78571429 0.8 0.71052632 0.73529412 0.71794872 0.75757576 0.73333333] mean value: 0.7458837359646862 key: train_precision value: [0.7630662 0.76551724 0.81818182 0.79642857 0.8028169 0.76530612 0.80215827 0.7804878 0.7739726 0.76124567] mean value: 0.7829181212677276 key: test_recall value: [0.71428571 0.92857143 0.67857143 0.78571429 0.85714286 0.93103448 0.86206897 0.96551724 0.86206897 0.75862069] mean value: 0.8343596059113301 key: train_recall value: [0.85214008 0.86381323 0.77042802 0.86770428 0.88715953 0.87890625 0.87109375 0.875 0.8828125 0.859375 ] mean value: 0.860843263618677 key: test_roc_auc value: [0.70197044 0.77463054 0.7703202 0.78940887 0.82512315 0.76908867 0.7703202 0.78633005 0.78817734 0.7364532 ] mean value: 0.7711822660098522 key: train_roc_auc value: [0.79325754 0.79909411 0.79927651 0.82252402 0.83420477 0.80521188 0.82854298 0.81493191 0.81300158 0.79544625] mean value: 0.8105491549124514 key: test_jcc value: [0.54054054 0.66666667 0.59375 0.64705882 0.70588235 0.675 0.65789474 0.7 0.67567568 0.59459459] mean value: 0.6457063390790171 key: train_jcc value: [0.67384615 0.68307692 0.65780731 0.71019108 0.7284345 0.69230769 0.7170418 0.70219436 0.70186335 0.67692308] mean value: 0.6943686254765951 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.011518 0.01054788 0.01172829 0.01120687 0.0108285 0.01070571 0.0116632 0.01078939 0.01071167 0.01073694] mean value: 0.011043643951416016 key: score_time value: [0.00911999 0.00915504 0.00916815 0.00972795 0.00929093 0.00973797 0.00984049 0.00906992 0.00910163 0.00909853] mean value: 0.009331059455871583 key: test_mcc value: [0.65104858 0.58076493 0.7257422 0.65018988 0.61453202 0.54377353 0.44019762 0.43881637 0.61453202 0.68736396] mean value: 0.594696112208837 key: train_mcc value: [0.653245 0.66081987 0.64199455 0.65105088 0.66139765 0.67260512 0.66473119 0.64928315 0.64523042 0.6456446 ] mean value: 0.6546002422198144 key: test_accuracy value: [0.8245614 0.78947368 0.85964912 0.8245614 0.80701754 0.77192982 0.71929825 0.71929825 0.80701754 0.84210526] mean value: 0.7964912280701755 key: train_accuracy value: [0.82651072 0.83040936 0.82066277 0.8245614 0.83040936 0.83625731 0.83235867 0.8245614 0.82261209 0.82261209] mean value: 0.8270955165692008 key: test_fscore value: [0.82758621 0.79310345 0.84615385 0.81481481 0.80701754 0.77966102 0.71428571 0.73333333 0.80701754 0.85245902] mean value: 0.7975432484822016 key: train_fscore value: [0.82917466 0.83106796 0.82509506 0.83146067 0.83428571 0.8372093 0.83137255 0.82213439 0.82261209 0.82533589] mean value: 0.8289748287731117 key: test_precision value: [0.8 0.76666667 0.91666667 0.84615385 0.79310345 0.76666667 0.74074074 0.70967742 0.82142857 0.8125 ] mean value: 0.7973604025953859 key: train_precision value: [0.81818182 0.82945736 0.80669145 0.80144404 0.81716418 0.83076923 0.83464567 0.832 0.82101167 0.81132075] mean value: 0.8202686182692108 key: test_recall value: [0.85714286 0.82142857 0.78571429 0.78571429 0.82142857 0.79310345 0.68965517 0.75862069 0.79310345 0.89655172] mean value: 0.8002463054187192 key: train_recall value: [0.84046693 0.83268482 0.84435798 0.86381323 0.85214008 0.84375 0.828125 0.8125 0.82421875 0.83984375] mean value: 0.8381900535019455 key: test_roc_auc value: [0.82512315 0.79002463 0.85837438 0.82389163 0.80726601 0.77155172 0.71982759 0.71859606 0.80726601 0.841133 ] mean value: 0.7963054187192118 key: train_roc_auc value: [0.82648346 0.83040491 0.82061649 0.82448474 0.83036691 0.83627189 0.83235044 0.82453794 0.82261521 0.82264561] mean value: 0.8270777602140078 key: test_jcc value: [0.70588235 0.65714286 0.73333333 0.6875 0.67647059 0.63888889 0.55555556 0.57894737 0.67647059 0.74285714] mean value: 0.6653048675610596 key: train_jcc value: [0.70819672 0.71096346 0.70226537 0.71153846 0.71568627 0.72 0.7114094 0.69798658 0.6986755 0.70261438] mean value: 0.7079336133605598 MCC on Blind test: 0.47 Accuracy on Blind test: 0.72 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01020646 0.01080585 0.01096845 0.0110321 0.01112604 0.01121211 0.01174545 0.01151824 0.01136279 0.01082182] mean value: 0.011079931259155273 key: score_time value: [0.01764226 0.01442289 0.01384616 0.01482534 0.01364756 0.01321912 0.01372838 0.01377702 0.01924062 0.01841283] mean value: 0.015276217460632324 key: test_mcc value: [0.43842365 0.47348988 0.47348988 0.51250867 0.47519927 0.58562417 0.38672631 0.33292257 0.44418104 0.37345948] mean value: 0.4496024911170981 key: train_mcc value: [0.67328032 0.68809535 0.68846183 0.68084694 0.67723397 0.68271507 0.69060158 0.68277413 0.674131 0.69135529] mean value: 0.6829495483259833 key: test_accuracy value: [0.71929825 0.73684211 0.73684211 0.75438596 0.73684211 0.78947368 0.68421053 0.66666667 0.71929825 0.68421053] mean value: 0.7228070175438597 key: train_accuracy value: [0.83625731 0.84210526 0.84405458 0.83820663 0.83625731 0.83820663 0.84405458 0.84015595 0.83625731 0.84210526] mean value: 0.839766081871345 key: test_fscore value: [0.71428571 0.72727273 0.72727273 0.73076923 0.71698113 0.77777778 0.64 0.6779661 0.7037037 0.66666667] mean value: 0.7082695781518935 key: train_fscore value: [0.83266932 0.83367556 0.84189723 0.82886598 0.82644628 0.82599581 0.83673469 0.83265306 0.82995951 0.82947368] mean value: 0.8318371141576139 key: test_precision value: [0.71428571 0.74074074 0.74074074 0.79166667 0.76 0.84 0.76190476 0.66666667 0.76 0.72 ] mean value: 0.7496005291005291 key: train_precision value: [0.85306122 0.8826087 0.85542169 0.88157895 0.88105727 0.89140271 0.87606838 0.87179487 0.86134454 0.89954338] mean value: 0.8753881702585781 key: test_recall value: [0.71428571 0.71428571 0.71428571 0.67857143 0.67857143 0.72413793 0.55172414 0.68965517 0.65517241 0.62068966] mean value: 0.6741379310344828 key: train_recall value: [0.81322957 0.78988327 0.82879377 0.78210117 0.77821012 0.76953125 0.80078125 0.796875 0.80078125 0.76953125] mean value: 0.7929717898832684 key: test_roc_auc value: [0.71921182 0.7364532 0.7364532 0.75307882 0.73583744 0.79064039 0.68657635 0.66625616 0.72044335 0.68534483] mean value: 0.7230295566502463 key: train_roc_auc value: [0.83630229 0.84220726 0.84408439 0.83831621 0.83637068 0.83807302 0.84397039 0.84007174 0.83618829 0.84196407] mean value: 0.8397548334143968 key: test_jcc value: [0.55555556 0.57142857 0.57142857 0.57575758 0.55882353 0.63636364 0.47058824 0.51282051 0.54285714 0.5 ] mean value: 0.5495623330917448 key: train_jcc value: [0.71331058 0.71478873 0.72696246 0.70774648 0.70422535 0.70357143 0.71929825 0.71328671 0.70934256 0.70863309] mean value: 0.7121165642473934 MCC on Blind test: 0.12 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02768993 0.02304482 0.02387285 0.02293777 0.02385044 0.02296257 0.0241065 0.02292609 0.02482152 0.0247345 ] mean value: 0.024094700813293457 key: score_time value: [0.01283813 0.0125947 0.01315451 0.01290488 0.01273608 0.01245785 0.01373458 0.01328564 0.0125618 0.01255894] mean value: 0.012882709503173828 key: test_mcc value: [0.72706729 0.70694956 0.68472906 0.75462449 0.68850906 0.79682005 0.68434084 0.7257422 0.80685836 0.68736396] mean value: 0.7263004883826937 key: train_mcc value: [0.80629722 0.81344801 0.7991838 0.8284734 0.82487701 0.81647956 0.82136234 0.82736541 0.82494487 0.81710876] mean value: 0.8179540383128404 key: test_accuracy value: [0.85964912 0.84210526 0.84210526 0.87719298 0.84210526 0.89473684 0.84210526 0.85964912 0.89473684 0.84210526] mean value: 0.8596491228070176 key: train_accuracy value: [0.9005848 0.90448343 0.89668616 0.9122807 0.91033138 0.90643275 0.90838207 0.9122807 0.91033138 0.90643275] mean value: 0.90682261208577 key: test_fscore value: [0.86666667 0.85714286 0.84210526 0.87272727 0.84745763 0.90322581 0.84745763 0.87096774 0.90625 0.85245902] mean value: 0.8666459878712518 key: train_fscore value: [0.90607735 0.90942699 0.90275229 0.91651206 0.91481481 0.91044776 0.91280148 0.91557223 0.91449814 0.91078067] mean value: 0.9113683791367706 key: test_precision value: [0.8125 0.77142857 0.82758621 0.88888889 0.80645161 0.84848485 0.83333333 0.81818182 0.82857143 0.8125 ] mean value: 0.8247926708688667 key: train_precision value: [0.86013986 0.86619718 0.85416667 0.87588652 0.87279152 0.87142857 0.86925795 0.88086643 0.87234043 0.86879433] mean value: 0.8691869453886878 key: test_recall value: [0.92857143 0.96428571 0.85714286 0.85714286 0.89285714 0.96551724 0.86206897 0.93103448 1. 0.89655172] mean value: 0.9155172413793103 key: train_recall value: [0.95719844 0.95719844 0.95719844 0.96108949 0.96108949 0.953125 0.9609375 0.953125 0.9609375 0.95703125] mean value: 0.9578930569066149 key: test_roc_auc value: [0.86083744 0.84421182 0.84236453 0.87684729 0.8429803 0.89347291 0.84174877 0.85837438 0.89285714 0.841133 ] mean value: 0.8594827586206897 key: train_roc_auc value: [0.90047422 0.90438047 0.89656797 0.91218537 0.91023225 0.90652359 0.90848431 0.91236017 0.91042984 0.90653119] mean value: 0.9068169382295721 key: test_jcc value: [0.76470588 0.75 0.72727273 0.77419355 0.73529412 0.82352941 0.73529412 0.77142857 0.82857143 0.74285714] mean value: 0.7653146947928732 key: train_jcc value: [0.82828283 0.83389831 0.82274247 0.84589041 0.84300341 0.83561644 0.83959044 0.84429066 0.84246575 0.83617747] mean value: 0.8371958199521154 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.05961466 2.23099542 2.19438291 2.01152205 2.39882874 2.11831498 2.21130466 2.09625578 2.12573075 2.13453245] mean value: 2.158148241043091 key: score_time value: [0.01481295 0.01604438 0.01651335 0.01263881 0.01380038 0.01449418 0.01461935 0.01480651 0.01514316 0.02394223] mean value: 0.015681529045104982 key: test_mcc value: [0.8615634 0.92980296 0.96547546 0.92980296 0.8951918 0.96551724 0.8953202 0.86189955 0.96547546 0.78940887] mean value: 0.9059457880082712 key: train_mcc value: [0.99610895 0.99610895 1. 1. 1. 1. 0.99610889 0.99610889 1. 0.99610889] mean value: 0.9980544569999117 key: test_accuracy value: [0.92982456 0.96491228 0.98245614 0.96491228 0.94736842 0.98245614 0.94736842 0.92982456 0.98245614 0.89473684] mean value: 0.9526315789473684 key: train_accuracy value: [0.99805068 0.99805068 1. 1. 1. 1. 0.99805068 0.99805068 1. 0.99805068] mean value: 0.9990253411306043 key: test_fscore value: [0.92592593 0.96428571 0.98181818 0.96428571 0.94545455 0.98245614 0.94736842 0.92857143 0.98305085 0.89655172] mean value: 0.9519768643340577 key: train_fscore value: [0.99805068 0.99805068 1. 1. 1. 1. 0.99804305 0.99804305 1. 0.99804305] mean value: 0.9990230523035137 key: test_precision value: [0.96153846 0.96428571 1. 0.96428571 0.96296296 1. 0.96428571 0.96296296 0.96666667 0.89655172] mean value: 0.9643539921126127 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.89285714 0.96428571 0.96428571 0.96428571 0.92857143 0.96551724 0.93103448 0.89655172 1. 0.89655172] mean value: 0.9403940886699508 key: train_recall value: [0.99610895 0.99610895 1. 1. 1. 1. 0.99609375 0.99609375 1. 0.99609375] mean value: 0.9980499148832684 key: test_roc_auc value: [0.92918719 0.96490148 0.98214286 0.96490148 0.94704433 0.98275862 0.9476601 0.93041872 0.98214286 0.89470443] mean value: 0.9525862068965518 key: train_roc_auc value: [0.99805447 0.99805447 1. 1. 1. 1. 0.99804688 0.99804688 1. 0.99804688] mean value: 0.9990249574416342 key: test_jcc value: [0.86206897 0.93103448 0.96428571 0.93103448 0.89655172 0.96551724 0.9 0.86666667 0.96666667 0.8125 ] mean value: 0.9096325944170772 key: train_jcc value: [0.99610895 0.99610895 1. 1. 1. 1. 0.99609375 0.99609375 1. 0.99609375] mean value: 0.9980499148832684 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02568269 0.02369571 0.01899219 0.01885223 0.02013445 0.0189333 0.01874685 0.01869583 0.01888299 0.01789403] mean value: 0.020051026344299318 key: score_time value: [0.01234221 0.00959468 0.00914192 0.00899601 0.00909853 0.00905037 0.00905776 0.00910354 0.00909448 0.00924468] mean value: 0.009472417831420898 key: test_mcc value: [0.92980296 0.96551724 0.96551724 0.96547546 0.8951918 0.96551724 0.96551724 0.9321832 0.92980296 0.85960591] mean value: 0.937413124669358 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.98245614 0.98245614 0.98245614 0.94736842 0.98245614 0.98245614 0.96491228 0.96491228 0.92982456] mean value: 0.968421052631579 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.98245614 0.98245614 0.98181818 0.94545455 0.98245614 0.98245614 0.96428571 0.96551724 0.93103448] mean value: 0.9682220441385595 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96428571 0.96551724 0.96551724 1. 0.96296296 1. 1. 1. 0.96551724 0.93103448] mean value: 0.9754834884145229 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 1. 1. 0.96428571 0.92857143 0.96551724 0.96551724 0.93103448 0.96551724 0.93103448] mean value: 0.961576354679803 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96490148 0.98275862 0.98275862 0.98214286 0.94704433 0.98275862 0.98275862 0.96551724 0.96490148 0.92980296] mean value: 0.9685344827586208 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.96551724 0.96551724 0.96428571 0.89655172 0.96551724 0.96551724 0.93103448 0.93333333 0.87096774] mean value: 0.9389276444726945 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12201738 0.12300944 0.12244582 0.12890625 0.13370728 0.12423348 0.12228918 0.12689996 0.13040471 0.1344378 ] mean value: 0.12683513164520263 key: score_time value: [0.01851535 0.01844716 0.01833797 0.02005148 0.01993847 0.01833677 0.0182426 0.01870298 0.01829839 0.01937294] mean value: 0.0188244104385376 key: test_mcc value: [0.86189955 0.96551724 0.96551724 0.93202124 0.82490815 0.92980296 0.92980296 0.8953202 0.93202124 0.79110556] mean value: 0.9027916331150521 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.92982456 0.98245614 0.98245614 0.96491228 0.9122807 0.96491228 0.96491228 0.94736842 0.96491228 0.89473684] mean value: 0.9508771929824561 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93103448 0.98245614 0.98245614 0.96296296 0.90909091 0.96551724 0.96551724 0.94736842 0.96666667 0.9 ] mean value: 0.9513070205992166 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.96551724 0.96551724 1. 0.92592593 0.96551724 0.96551724 0.96428571 0.93548387 0.87096774] mean value: 0.9458732218632108 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 1. 1. 0.92857143 0.89285714 0.96551724 0.96551724 0.93103448 1. 0.93103448] mean value: 0.9578817733990148 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93041872 0.98275862 0.98275862 0.96428571 0.91194581 0.96490148 0.96490148 0.9476601 0.96428571 0.89408867] mean value: 0.9508004926108374 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.87096774 0.96551724 0.96551724 0.92857143 0.83333333 0.93333333 0.93333333 0.9 0.93548387 0.81818182] mean value: 0.9084239342415094 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.7 Accuracy on Blind test: 0.86 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01195002 0.01176 0.01078629 0.01067519 0.0106318 0.01081967 0.01062989 0.01064992 0.01090121 0.01058793] mean value: 0.010939192771911622 key: score_time value: [0.00949717 0.00912499 0.00906086 0.00918531 0.00918293 0.00911808 0.00913429 0.00925946 0.00899553 0.00907516] mean value: 0.009163379669189453 key: test_mcc value: [0.38056438 0.75808552 0.65466436 0.7257422 0.61453202 0.72706729 0.72242731 0.79778885 0.56277738 0.65104858] mean value: 0.659469790472868 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.68421053 0.87719298 0.8245614 0.85964912 0.80701754 0.85964912 0.84210526 0.89473684 0.77192982 0.8245614 ] mean value: 0.8245614035087719 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.625 0.86792453 0.80769231 0.84615385 0.80701754 0.85185185 0.81632653 0.88888889 0.74509804 0.82142857] mean value: 0.8077382108004934 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.92 0.875 0.91666667 0.79310345 0.92 1. 0.96 0.86363636 0.85185185] mean value: 0.8850258330430745 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.53571429 0.82142857 0.75 0.78571429 0.82142857 0.79310345 0.68965517 0.82758621 0.65517241 0.79310345] mean value: 0.7472906403940887 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.68165025 0.87623153 0.82327586 0.85837438 0.80726601 0.86083744 0.84482759 0.89593596 0.77401478 0.82512315] mean value: 0.8247536945812808 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.45454545 0.76666667 0.67741935 0.73333333 0.67647059 0.74193548 0.68965517 0.8 0.59375 0.6969697 ] mean value: 0.6830745750873917 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.48 Accuracy on Blind test: 0.78 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.81883883 1.82968402 1.83166051 1.8007412 1.78529477 1.97610807 1.88550544 1.8663559 1.89368105 1.85617685] mean value: 1.854404664039612 key: score_time value: [0.0972929 0.09529185 0.09653831 0.09236741 0.10142279 0.10048246 0.09244108 0.09494305 0.10010195 0.09162307] mean value: 0.09625048637390136 key: test_mcc value: [0.96551724 0.96551724 1. 1. 0.92980296 0.96547546 0.96547546 0.8953202 0.96547546 0.93202124] mean value: 0.9584605246454952 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 1. 1. 0.96491228 0.98245614 0.98245614 0.94736842 0.98245614 0.96491228] mean value: 0.9789473684210526 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.98245614 1. 1. 0.96428571 0.98305085 0.98305085 0.94736842 0.98305085 0.96666667] mean value: 0.9792385625079648 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96551724 0.96551724 1. 1. 0.96428571 0.96666667 0.96666667 0.96428571 0.96666667 0.93548387] mean value: 0.9695089782297791 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 1. 1. 0.93103448 1. 1. ] mean value: 0.9895320197044335 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.98275862 1. 1. 0.96490148 0.98214286 0.98214286 0.9476601 0.98214286 0.96428571] mean value: 0.9788793103448277 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.96551724 1. 1. 0.93103448 0.96666667 0.96666667 0.9 0.96666667 0.93548387] mean value: 0.9597552836484984 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.95496798 0.96763968 0.97145748 0.96551633 0.96309352 1.08149958 0.96972513 0.97088814 1.0209651 1.06677747] mean value: 0.9932530403137207 key: score_time value: [0.24825764 0.25175595 0.20534968 0.27622366 0.27167201 0.19710779 0.2121675 0.23378062 0.21063542 0.2486136 ] mean value: 0.23555638790130615 key: test_mcc value: [0.96551724 0.89988258 1. 0.92980296 0.92980296 0.96547546 0.93202124 0.8953202 0.96547546 0.8951918 ] mean value: 0.9378489884746699 key: train_mcc value: [0.96907457 0.97289329 0.97672617 0.96907457 0.98069236 0.97289533 0.97672758 0.98057426 0.96907736 0.9922027 ] mean value: 0.9759938179304501 key: test_accuracy value: [0.98245614 0.94736842 1. 0.96491228 0.96491228 0.98245614 0.96491228 0.94736842 0.98245614 0.94736842] mean value: 0.968421052631579 key: train_accuracy value: [0.98440546 0.98635478 0.98830409 0.98440546 0.99025341 0.98635478 0.98830409 0.99025341 0.98440546 0.99610136] mean value: 0.9879142300194932 key: test_fscore value: [0.98245614 0.94915254 1. 0.96428571 0.96428571 0.98305085 0.96666667 0.94736842 0.98305085 0.94915254] mean value: 0.9689469436302621 key: train_fscore value: [0.98461538 0.98651252 0.98841699 0.98461538 0.99036609 0.98646035 0.98837209 0.99029126 0.98455598 0.99609375] mean value: 0.9880299808242159 key: test_precision value: [0.96551724 0.90322581 1. 0.96428571 0.96428571 0.96666667 0.93548387 0.96428571 0.96666667 0.93333333] mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( 0.9563750728322474 key: train_precision value: [0.97338403 0.97709924 0.98084291 0.97338403 0.98091603 0.97701149 0.98076923 0.98455598 0.97328244 0.99609375] mean value: 0.979733914221565 key: test_recall value: [1. 1. 1. 0.96428571 0.96428571 1. 1. 0.93103448 1. 0.96551724] mean value: 0.982512315270936 key: train_recall value: [0.99610895 0.99610895 0.99610895 0.99610895 1. 0.99609375 0.99609375 0.99609375 0.99609375 0.99609375] mean value: 0.996490454766537 key: test_roc_auc value: [0.98275862 0.94827586 1. 0.96490148 0.96490148 0.98214286 0.96428571 0.9476601 0.98214286 0.94704433] mean value: 0.9684113300492612 key: train_roc_auc value: [0.9843826 0.98633572 0.98828885 0.9843826 0.99023438 0.98637372 0.98831925 0.99026477 0.9844282 0.99610135] mean value: 0.9879111442120623 key: test_jcc value: [0.96551724 0.90322581 1. 0.93103448 0.93103448 0.96666667 0.93548387 0.9 0.96666667 0.90322581] mean value: 0.9402855024100852 key: train_jcc value: [0.96969697 0.97338403 0.97709924 0.96969697 0.98091603 0.97328244 0.97701149 0.98076923 0.96958175 0.9922179 ] mean value: 0.9763656052640073 MCC on Blind test: 0.89 Accuracy on Blind test: 0.94 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02656937 0.01063704 0.01065159 0.01167321 0.01107359 0.0106473 0.01130176 0.01063609 0.01054382 0.01125193] mean value: 0.012498569488525391 key: score_time value: [0.00993228 0.00957012 0.01008916 0.00943708 0.00952315 0.00921941 0.00988579 0.00899458 0.00932455 0.00942326] mean value: 0.009539937973022461 key: test_mcc value: [0.65104858 0.58076493 0.7257422 0.65018988 0.61453202 0.54377353 0.44019762 0.43881637 0.61453202 0.68736396] mean value: 0.594696112208837 key: train_mcc value: [0.653245 0.66081987 0.64199455 0.65105088 0.66139765 0.67260512 0.66473119 0.64928315 0.64523042 0.6456446 ] mean value: 0.6546002422198144 key: test_accuracy value: [0.8245614 0.78947368 0.85964912 0.8245614 0.80701754 0.77192982 0.71929825 0.71929825 0.80701754 0.84210526] mean value: 0.7964912280701755 key: train_accuracy value: [0.82651072 0.83040936 0.82066277 0.8245614 0.83040936 0.83625731 0.83235867 0.8245614 0.82261209 0.82261209] mean value: 0.8270955165692008 key: test_fscore value: [0.82758621 0.79310345 0.84615385 0.81481481 0.80701754 0.77966102 0.71428571 0.73333333 0.80701754 0.85245902] mean value: 0.7975432484822016 key: train_fscore value: [0.82917466 0.83106796 0.82509506 0.83146067 0.83428571 0.8372093 0.83137255 0.82213439 0.82261209 0.82533589] mean value: 0.8289748287731117 key: test_precision value: [0.8 0.76666667 0.91666667 0.84615385 0.79310345 0.76666667 0.74074074 0.70967742 0.82142857 0.8125 ] mean value: 0.7973604025953859 key: train_precision value: [0.81818182 0.82945736 0.80669145 0.80144404 0.81716418 0.83076923 0.83464567 0.832 0.82101167 0.81132075] mean value: 0.8202686182692108 key: test_recall value: [0.85714286 0.82142857 0.78571429 0.78571429 0.82142857 0.79310345 0.68965517 0.75862069 0.79310345 0.89655172] mean value: 0.8002463054187192 key: train_recall value: [0.84046693 0.83268482 0.84435798 0.86381323 0.85214008 0.84375 0.828125 0.8125 0.82421875 0.83984375] mean value: 0.8381900535019455 key: test_roc_auc value: [0.82512315 0.79002463 0.85837438 0.82389163 0.80726601 0.77155172 0.71982759 0.71859606 0.80726601 0.841133 ] mean value: 0.7963054187192118 key: train_roc_auc value: [0.82648346 0.83040491 0.82061649 0.82448474 0.83036691 0.83627189 0.83235044 0.82453794 0.82261521 0.82264561] mean value: 0.8270777602140078 key: test_jcc value: [0.70588235 0.65714286 0.73333333 0.6875 0.67647059 0.63888889 0.55555556 0.57894737 0.67647059 0.74285714] mean value: 0.6653048675610596 key: train_jcc value: [0.70819672 0.71096346 0.70226537 0.71153846 0.71568627 0.72 0.7114094 0.69798658 0.6986755 0.70261438] mean value: 0.7079336133605598 MCC on Blind test: 0.47 Accuracy on Blind test: 0.72 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.12568951 0.07826042 0.09557486 0.22482085 0.07972693 0.08724403 0.07112598 0.08922219 0.62100506 0.09891844] mean value: 0.15715882778167725 key: score_time value: [0.01179385 0.01150036 0.01145864 0.01169968 0.01133823 0.01128864 0.01081276 0.01294708 0.01268101 0.01360011] mean value: 0.011912035942077636 key: test_mcc value: [0.96547546 0.96551724 1. 0.96547546 0.92980296 0.96551724 0.96551724 0.86189955 0.96547546 0.93202124] mean value: 0.9516701836265801 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 1. 0.98245614 0.96491228 0.98245614 0.98245614 0.92982456 0.98245614 0.96491228] mean value: 0.975438596491228 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.98245614 1. 0.98181818 0.96428571 0.98245614 0.98245614 0.92857143 0.98305085 0.96666667] mean value: 0.9753579441670431 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96551724 1. 1. 0.96428571 1. 1. 0.96296296 0.96666667 0.93548387] mean value: 0.9794916456262396 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 1. 1. 0.96428571 0.96428571 0.96551724 0.96551724 0.89655172 1. 1. ] mean value: 0.9720443349753695 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.98275862 1. 0.98214286 0.96490148 0.98275862 0.98275862 0.93041872 0.98214286 0.96428571] mean value: 0.9754310344827587 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.96551724 1. 0.96428571 0.93103448 0.96551724 0.96551724 0.86666667 0.96666667 0.93548387] mean value: 0.9524974839769056 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0796392 0.06554699 0.07161856 0.04404068 0.07942057 0.05319357 0.07177877 0.09217787 0.07455873 0.07560158] mean value: 0.07075765132904052 key: score_time value: [0.01967144 0.0233593 0.01252985 0.01237559 0.01997495 0.01243973 0.02639699 0.02463031 0.02326989 0.01278424] mean value: 0.018743228912353516 key: test_mcc value: [0.85960591 0.86189955 0.8953202 0.80685836 0.8615634 1. 0.86189955 0.85960591 0.96547546 0.8615634 ] mean value: 0.8833791728556848 key: train_mcc value: [0.97277537 0.9688108 0.96497735 0.96883978 0.96883978 0.96884072 0.9688108 0.96509685 0.9610433 0.9610433 ] mean value: 0.9669078047012456 key: test_accuracy value: [0.92982456 0.92982456 0.94736842 0.89473684 0.92982456 1. 0.92982456 0.92982456 0.98245614 0.92982456] mean value: 0.9403508771929825 key: train_accuracy value: [0.98635478 0.98440546 0.98245614 0.98440546 0.98440546 0.98440546 0.98440546 0.98245614 0.98050682 0.98050682] mean value: 0.9834307992202729 key: test_fscore value: [0.92857143 0.93103448 0.94736842 0.88 0.92592593 1. 0.92857143 0.93103448 0.98305085 0.93333333] mean value: 0.9388890350429616 key: train_fscore value: [0.98646035 0.9844358 0.98259188 0.98449612 0.98449612 0.9844358 0.984375 0.98259188 0.98054475 0.98054475] mean value: 0.9834972438136449 key: test_precision value: [0.92857143 0.9 0.93103448 1. 0.96153846 1. 0.96296296 0.93103448 0.96666667 0.90322581] mean value: 0.9485034291708374 key: train_precision value: [0.98076923 0.9844358 0.97692308 0.98069498 0.98069498 0.98062016 0.984375 0.97318008 0.97674419 0.97674419] mean value: 0.9795181670507774 key: test_recall value: [0.92857143 0.96428571 0.96428571 0.78571429 0.89285714 1. 0.89655172 0.93103448 1. 0.96551724] mean value: 0.9328817733990148 key: train_recall value: [0.9922179 0.9844358 0.98832685 0.98832685 0.98832685 0.98828125 0.984375 0.9921875 0.984375 0.984375 ] mean value: 0.9875227991245137 key: test_roc_auc value: [0.92980296 0.93041872 0.9476601 0.89285714 0.92918719 1. 0.93041872 0.92980296 0.98214286 0.92918719] mean value: 0.9401477832512316 key: train_roc_auc value: [0.98634332 0.9844054 0.98244467 0.9843978 0.9843978 0.984413 0.9844054 0.98247507 0.98051435 0.98051435] mean value: 0.9834311162451362 key: test_jcc value: [0.86666667 0.87096774 0.9 0.78571429 0.86206897 1. 0.86666667 0.87096774 0.96666667 0.875 ] mean value: 0.8864718735102495 key: train_jcc value: [0.97328244 0.96934866 0.96577947 0.96946565 0.96946565 0.96934866 0.96923077 0.96577947 0.96183206 0.96183206] mean value: 0.9675364885195069 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02528572 0.01048207 0.01013875 0.01027489 0.01146173 0.01026607 0.01069283 0.01048994 0.01134014 0.01128221] mean value: 0.012171435356140136 key: score_time value: [0.0208075 0.00920558 0.00884104 0.00893402 0.01020646 0.00880122 0.00880194 0.00962806 0.00974655 0.00974631] mean value: 0.010471868515014648 key: test_mcc value: [0.65018988 0.66511153 0.54433498 0.72064772 0.68472906 0.7257422 0.57881773 0.68736396 0.79682005 0.75462449] mean value: 0.6808381620716417 key: train_mcc value: [0.66541423 0.69722643 0.67753494 0.674131 0.69108402 0.68465245 0.69245402 0.70478447 0.70511142 0.68889059] mean value: 0.6881283571497584 key: test_accuracy value: [0.8245614 0.8245614 0.77192982 0.85964912 0.84210526 0.85964912 0.78947368 0.84210526 0.89473684 0.87719298] mean value: 0.8385964912280701 key: train_accuracy value: [0.83235867 0.84795322 0.83820663 0.83625731 0.84405458 0.84210526 0.8460039 0.85185185 0.85185185 0.84405458] mean value: 0.8434697855750487 key: test_fscore value: [0.81481481 0.83870968 0.77192982 0.85185185 0.84210526 0.87096774 0.79310345 0.85245902 0.90322581 0.88135593] mean value: 0.8420523377065111 key: train_fscore value: [0.8365019 0.85283019 0.84310019 0.84210526 0.85130112 0.84452975 0.84836852 0.85551331 0.85606061 0.84732824] mean value: 0.8477639088128366 key: test_precision value: [0.84615385 0.76470588 0.75862069 0.88461538 0.82758621 0.81818182 0.79310345 0.8125 0.84848485 0.86666667] mean value: 0.8220618791283092 key: train_precision value: [0.81784387 0.82783883 0.81985294 0.81454545 0.81494662 0.83018868 0.83396226 0.83333333 0.83088235 0.82835821] mean value: 0.8251752547574799 key: test_recall value: [0.78571429 0.92857143 0.78571429 0.82142857 0.85714286 0.93103448 0.79310345 0.89655172 0.96551724 0.89655172] mean value: 0.8661330049261083 key: train_recall value: [0.85603113 0.87937743 0.86770428 0.87159533 0.89105058 0.859375 0.86328125 0.87890625 0.8828125 0.8671875 ] mean value: 0.8717321254863813 key: test_roc_auc value: [0.82389163 0.82635468 0.77216749 0.85899015 0.84236453 0.85837438 0.78940887 0.841133 0.89347291 0.87684729] mean value: 0.8383004926108375 key: train_roc_auc value: [0.83231244 0.84789184 0.83814902 0.83618829 0.84396279 0.84213886 0.84603751 0.85190449 0.85191209 0.84409959] mean value: 0.8434596911478599 key: test_jcc value: [0.6875 0.72222222 0.62857143 0.74193548 0.72727273 0.77142857 0.65714286 0.74285714 0.82352941 0.78787879] mean value: 0.7290338633009411 key: train_jcc value: [0.71895425 0.74342105 0.72875817 0.72727273 0.74110032 0.73089701 0.73666667 0.74750831 0.74834437 0.73509934] mean value: 0.7358022212720111 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02382302 0.02296519 0.02932572 0.02991176 0.02761936 0.02511883 0.02549958 0.02771902 0.02718163 0.02515793] mean value: 0.026432204246520995 key: score_time value: [0.01161408 0.01145077 0.01257825 0.01203704 0.01204038 0.012815 0.01315665 0.01242328 0.01236939 0.01229191] mean value: 0.012277674674987794 key: test_mcc value: [0.80685836 0.86189955 0.93202124 0.9321832 0.8951918 0.96551724 0.79161589 0.86189955 0.96551724 0.89952865] mean value: 0.8912232712036543 key: train_mcc value: [0.92752109 0.96884072 0.97271705 0.95393799 0.97271663 0.96491975 0.95729513 0.96907736 0.94209644 0.94273737] mean value: 0.9571859535336262 key: test_accuracy value: [0.89473684 0.92982456 0.96491228 0.96491228 0.94736842 0.98245614 0.89473684 0.92982456 0.98245614 0.94736842] mean value: 0.943859649122807 key: train_accuracy value: [0.96296296 0.98440546 0.98635478 0.97660819 0.98635478 0.98245614 0.9785575 0.98440546 0.97076023 0.97076023] mean value: 0.9783625730994152 key: test_fscore value: [0.88 0.93103448 0.96296296 0.96551724 0.94545455 0.98245614 0.89285714 0.92857143 0.98245614 0.95081967] mean value: 0.9422129756816913 key: train_fscore value: [0.96192385 0.984375 0.98635478 0.97709924 0.98640777 0.98245614 0.97830375 0.98455598 0.97017893 0.97142857] mean value: 0.9783083997466665 key: test_precision value: [1. 0.9 1. 0.93333333 0.96296296 1. 0.92592593 0.96296296 1. 0.90625 ] mean value: 0.9591435185185185 key: train_precision value: [0.99173554 0.98823529 0.98828125 0.9588015 0.98449612 0.98054475 0.98804781 0.97328244 0.98785425 0.94795539] mean value: 0.978923434340754 key: test_recall value: [0.78571429 0.96428571 0.92857143 1. 0.92857143 0.96551724 0.86206897 0.89655172 0.96551724 1. ] mean value: 0.929679802955665 key: train_recall value: [0.93385214 0.98054475 0.9844358 0.99610895 0.98832685 0.984375 0.96875 0.99609375 0.953125 0.99609375] mean value: 0.9781705982490272 key: test_roc_auc value: [0.89285714 0.93041872 0.96428571 0.96551724 0.94704433 0.98275862 0.8953202 0.93041872 0.98275862 0.94642857] mean value: 0.9437807881773399 key: train_roc_auc value: [0.96301982 0.984413 0.98635852 0.9765701 0.98635092 0.98245987 0.97853842 0.9844282 0.97072592 0.97080952] mean value: 0.9783674306906615 key: test_jcc value: [0.78571429 0.87096774 0.92857143 0.93333333 0.89655172 0.96551724 0.80645161 0.86666667 0.96551724 0.90625 ] mean value: 0.8925541276020976 key: train_jcc value: [0.92664093 0.96923077 0.97307692 0.95522388 0.97318008 0.96551724 0.95752896 0.96958175 0.94208494 0.94444444] mean value: 0.9576509910661071 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01926589 0.01759005 0.02008891 0.02133679 0.01895285 0.02249408 0.01945829 0.02094722 0.02044463 0.0228014 ] mean value: 0.020338010787963868 key: score_time value: [0.01128817 0.01206374 0.01268244 0.01210356 0.01209688 0.01208711 0.01209283 0.0120666 0.01214123 0.01211452] mean value: 0.012073707580566407 key: test_mcc value: [0.9321832 0.77903565 0.86189955 0.92980296 0.76550573 0.89988258 0.79161589 0.77903565 0.77903565 0.79161589] mean value: 0.8309612752998973 key: train_mcc value: [0.95018762 0.86475876 0.8926403 0.9454189 0.79780407 0.90819008 0.97271663 0.81742544 0.71471052 0.97307046] mean value: 0.8836922787721689 key: test_accuracy value: [0.96491228 0.87719298 0.92982456 0.96491228 0.87719298 0.94736842 0.89473684 0.87719298 0.87719298 0.89473684] mean value: 0.9105263157894736 key: train_accuracy value: [0.97465887 0.92787524 0.94346979 0.97270955 0.88888889 0.95321637 0.98635478 0.9005848 0.83820663 0.98635478] mean value: 0.9372319688109161 key: test_fscore value: [0.96551724 0.88888889 0.93103448 0.96428571 0.8627451 0.94545455 0.89285714 0.8627451 0.8627451 0.89285714] mean value: 0.9069130452599012 key: train_fscore value: [0.9752381 0.93284936 0.946593 0.97276265 0.87527352 0.9516129 0.98630137 0.88937093 0.80652681 0.98613861] mean value: 0.9322667256993225 key: test_precision value: [0.93333333 0.8 0.9 0.96428571 0.95652174 1. 0.92592593 1. 1. 0.92592593] mean value: 0.9405992638601335 key: train_precision value: [0.95522388 0.87414966 0.8986014 0.97276265 1. 0.98333333 0.98823529 1. 1. 1. ] mean value: 0.9672306212427736 key: test_recall value: [1. 1. 0.96428571 0.96428571 0.78571429 0.89655172 0.86206897 0.75862069 0.75862069 0.86206897] mean value: 0.8852216748768473 key: train_recall value: [0.99610895 1. 1. 0.97276265 0.77821012 0.921875 0.984375 0.80078125 0.67578125 0.97265625] mean value: 0.9102550462062257 key: test_roc_auc value: [0.96551724 0.87931034 0.93041872 0.96490148 0.87561576 0.94827586 0.8953202 0.87931034 0.87931034 0.8953202 ] mean value: 0.9113300492610837 key: train_roc_auc value: [0.97461697 0.92773438 0.94335938 0.97270945 0.88910506 0.9531554 0.98635092 0.90039062 0.83789062 0.98632812] mean value: 0.9371640928988327 key: test_jcc value: [0.93333333 0.8 0.87096774 0.93103448 0.75862069 0.89655172 0.80645161 0.75862069 0.75862069 0.80645161] mean value: 0.8320652576937337 key: train_jcc value: [0.95167286 0.87414966 0.8986014 0.9469697 0.77821012 0.90769231 0.97297297 0.80078125 0.67578125 0.97265625] mean value: 0.8779487765285371 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.19932365 0.18372822 0.17802 0.17789721 0.17805314 0.17863345 0.17918444 0.17962122 0.17842484 0.17965174] mean value: 0.18125379085540771 key: score_time value: [0.01686788 0.0161252 0.01570821 0.01544094 0.01524329 0.01548839 0.0152936 0.0156951 0.01531601 0.01533294] mean value: 0.01565115451812744 key: test_mcc value: [0.8951918 0.96551724 1. 0.96547546 0.92980296 0.96551724 0.96551724 0.92980296 0.92980296 0.93202124] mean value: 0.9478649091676867 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.98245614 1. 0.98245614 0.96491228 0.98245614 0.98245614 0.96491228 0.96491228 0.96491228] mean value: 0.9736842105263157 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94545455 0.98245614 1. 0.98181818 0.96428571 0.98245614 0.98245614 0.96551724 0.96551724 0.96666667] mean value: 0.973662801203636 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 0.96551724 1. 1. 0.96428571 1. 1. 0.96551724 0.96551724 0.93548387] mean value: 0.975928427235435 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92857143 1. 1. 0.96428571 0.96428571 0.96551724 0.96551724 0.96551724 0.96551724 1. ] mean value: 0.9719211822660099 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94704433 0.98275862 1. 0.98214286 0.96490148 0.98275862 0.98275862 0.96490148 0.96490148 0.96428571] mean value: 0.9736453201970444 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.89655172 0.96551724 1. 0.96428571 0.93103448 0.96551724 0.96551724 0.93333333 0.93333333 0.93548387] mean value: 0.9490574182954605 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.94 Accuracy on Blind test: 0.97 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05743885 0.07335186 0.06502676 0.06323075 0.06822038 0.06512856 0.06676459 0.08469748 0.08225036 0.09285426] mean value: 0.07189638614654541 key: score_time value: [0.01954103 0.01935101 0.02997541 0.0232482 0.01905513 0.02338171 0.02898955 0.04204869 0.02914691 0.03998756] mean value: 0.027472519874572755 key: test_mcc value: [0.92980296 1. 1. 0.93202124 0.8951918 1. 0.92980296 0.89988258 0.92980296 0.8951918 ] mean value: 0.9411696291544969 key: train_mcc value: [0.99610895 0.99610895 1. 0.99223298 0.99223298 1. 0.99610895 1. 0.99610889 1. ] mean value: 0.9968901699256791 key: test_accuracy value: [0.96491228 1. 1. 0.96491228 0.94736842 1. 0.96491228 0.94736842 0.96491228 0.94736842] mean value: 0.9701754385964912 key: train_accuracy value: [0.99805068 0.99805068 1. 0.99610136 0.99610136 1. 0.99805068 1. 0.99805068 1. ] mean value: 0.9984405458089668 key: test_fscore value: [0.96428571 1. 1. 0.96296296 0.94545455 1. 0.96551724 0.94545455 0.96551724 0.94915254] mean value: 0.9698344793289271 key: train_fscore value: [0.99805068 0.99805068 1. 0.99609375 0.99609375 1. 0.99805068 1. 0.99804305 1. ] mean value: 0.9984382599621199 key: test_precision value: [0.96428571 1. 1. 1. 0.96296296 1. 0.96551724 1. 0.96551724 0.93333333] mean value: 0.9791616493340631 key: train_precision value: [1. 1. 1. 1. 1. 1. 0.99610895 1. 1. 1. ] mean value: 0.9996108949416342 key: test_recall value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [0.96428571 1. 1. 0.92857143 0.92857143 1. 0.96551724 0.89655172 0.96551724 0.96551724] mean value: 0.9614532019704434 key: train_recall value: [0.99610895 0.99610895 1. 0.9922179 0.9922179 1. 1. 1. 0.99609375 1. ] mean value: 0.9972747446498055 key: test_roc_auc value: [0.96490148 1. 1. 0.96428571 0.94704433 1. 0.96490148 0.94827586 0.96490148 0.94704433] mean value: 0.9701354679802956 key: train_roc_auc value: [0.99805447 0.99805447 1. 0.99610895 0.99610895 1. 0.99805447 1. 0.99804688 1. ] mean value: 0.9984428197957198 key: test_jcc value: [0.93103448 1. 1. 0.92857143 0.89655172 1. 0.93333333 0.89655172 0.93333333 0.90322581] mean value: 0.9422601832724191 key: train_jcc value: [0.99610895 0.99610895 1. 0.9922179 0.9922179 1. 0.99610895 1. 0.99609375 1. ] mean value: 0.9968856395914397 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.18004179 0.19584107 0.23010325 0.21314287 0.2015121 0.20719767 0.16908312 0.20237637 0.17533731 0.1983788 ] mean value: 0.19730143547058104 key: score_time value: [0.02584887 0.02601671 0.02588081 0.02573323 0.02579021 0.02653241 0.01559448 0.01567507 0.02591038 0.02642345] mean value: 0.023940563201904297 key: test_mcc value: [0.78940887 0.92980296 0.82880708 0.82490815 0.75492611 0.82942474 0.79778885 0.75492611 0.78940887 0.68472906] mean value: 0.7984130796385527 key: train_mcc value: [0.9766081 0.97663814 0.98057426 0.96883978 0.98831165 0.98069236 0.9844054 0.97289329 0.98443509 0.98051405] mean value: 0.979391211760016 key: test_accuracy value: [0.89473684 0.96491228 0.9122807 0.9122807 0.87719298 0.9122807 0.89473684 0.87719298 0.89473684 0.84210526] mean value: 0.8982456140350877 key: train_accuracy value: [0.98830409 0.98830409 0.99025341 0.98440546 0.99415205 0.99025341 0.99220273 0.98635478 0.99220273 0.99025341] mean value: 0.9896686159844055 key: test_fscore value: [0.89285714 0.96428571 0.90566038 0.90909091 0.87719298 0.90909091 0.88888889 0.87719298 0.89655172 0.84210526] mean value: 0.8962916893780161 key: train_fscore value: [0.98832685 0.98828125 0.99021526 0.98449612 0.99415205 0.99013807 0.9921875 0.98619329 0.99215686 0.99021526] mean value: 0.9896362521131238 key: test_precision value: [0.89285714 0.96428571 0.96 0.92592593 0.86206897 0.96153846 0.96 0.89285714 0.89655172 0.85714286] mean value: 0.9173227934262417 key: train_precision value: [0.98832685 0.99215686 0.99606299 0.98069498 0.99609375 1. 0.9921875 0.99601594 0.99606299 0.99215686] mean value: 0.9929758724941152 key: test_recall value: [0.89285714 0.96428571 0.85714286 0.89285714 0.89285714 0.86206897 0.82758621 0.86206897 0.89655172 0.82758621] mean value: 0.8775862068965518 key: train_recall value: [0.98832685 0.9844358 0.9844358 0.98832685 0.9922179 0.98046875 0.9921875 0.9765625 0.98828125 0.98828125] mean value: 0.9863524440661479 key: test_roc_auc value: [0.89470443 0.96490148 0.91133005 0.91194581 0.87746305 0.91317734 0.89593596 0.87746305 0.89470443 0.84236453] mean value: 0.8983990147783252 key: train_roc_auc value: [0.98830405 0.98831165 0.99026477 0.9843978 0.99415582 0.99023438 0.9922027 0.98633572 0.9921951 0.99024957] mean value: 0.9896651568579766 key: test_jcc value: [0.80645161 0.93103448 0.82758621 0.83333333 0.78125 0.83333333 0.8 0.78125 0.8125 0.72727273] mean value: 0.8134011696497793 key: train_jcc value: [0.97692308 0.97683398 0.98062016 0.96946565 0.98837209 0.98046875 0.98449612 0.97276265 0.9844358 0.98062016] mean value: 0.9794998423323565 MCC on Blind test: 0.45 Accuracy on Blind test: 0.75 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.73218632 0.71230316 0.71463799 0.71179533 0.70079851 0.72155094 0.72182274 0.72095418 0.71527052 0.71961784] mean value: 0.7170937538146973 key: score_time value: [0.00951099 0.00945258 0.01003933 0.01020122 0.00973988 0.00968409 0.00953937 0.00969291 0.01006198 0.01011348] mean value: 0.009803581237792968 key: test_mcc value: [0.92980296 0.96551724 1. 0.96547546 0.92980296 1. 0.96551724 0.89988258 0.96547546 0.93202124] mean value: 0.9553495126484416 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.98245614 1. 0.98245614 0.96491228 1. 0.98245614 0.94736842 0.98245614 0.96491228] mean value: 0.9771929824561403 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.98245614 1. 0.98181818 0.96428571 1. 0.98245614 0.94545455 0.98305085 0.96666667] mean value: 0.9770473950670204 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96428571 0.96551724 1. 1. 0.96428571 1. 1. 1. 0.96666667 0.93548387] mean value: 0.9796239207585148 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 1. 1. 0.96428571 0.96428571 1. 0.96551724 0.89655172 1. 1. ] mean value: 0.9754926108374384 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96490148 0.98275862 1. 0.98214286 0.96490148 1. 0.98275862 0.94827586 0.98214286 0.96428571] mean value: 0.977216748768473 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.96551724 1. 0.96428571 0.93103448 1. 0.96551724 0.89655172 0.96666667 0.93548387] mean value: 0.9556091424333916 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03368616 0.03474903 0.03299999 0.04718065 0.03263092 0.03254867 0.0505898 0.04946899 0.05832887 0.04263043] mean value: 0.04148135185241699 key: score_time value: [0.01290917 0.01676774 0.01531792 0.02634883 0.01512575 0.01687789 0.03070521 0.01287818 0.01288342 0.01452231] mean value: 0.017433643341064453 key: test_mcc value: [0.60497779 0.75808552 0.78940887 0.553659 0.65104858 0.54592083 0.69397486 0.58562417 0.75462449 0.54433498] mean value: 0.6481659094831543 key: train_mcc value: [0.89961107 0.92485373 0.910898 0.85981783 0.9321313 0.88066244 0.82869394 0.81410875 0.95018762 0.89587763] mean value: 0.8896842312566976 key: test_accuracy value: [0.78947368 0.87719298 0.89473684 0.77192982 0.8245614 0.77192982 0.84210526 0.78947368 0.87719298 0.77192982] mean value: 0.8210526315789474 key: train_accuracy value: [0.94931774 0.96101365 0.95516569 0.92787524 0.96491228 0.93957115 0.91423002 0.89863548 0.97465887 0.94736842] mean value: 0.9432748538011696 key: test_fscore value: [0.8125 0.86792453 0.89285714 0.78688525 0.82758621 0.78688525 0.85714286 0.77777778 0.88135593 0.77192982] mean value: 0.8262844761544288 key: train_fscore value: [0.95057034 0.95951417 0.95445545 0.93135436 0.96370968 0.94117647 0.91505792 0.88695652 0.9740519 0.94589178] mean value: 0.9422738582295507 key: test_precision value: [0.72222222 0.92 0.89285714 0.72727273 0.8 0.75 0.79411765 0.84 0.86666667 0.78571429] mean value: 0.8098850691791868 key: train_precision value: [0.92936803 1. 0.97177419 0.89007092 1. 0.91512915 0.90458015 1. 0.99591837 0.97119342] mean value: 0.9578034232222047 key: test_recall value: [0.92857143 0.82142857 0.89285714 0.85714286 0.85714286 0.82758621 0.93103448 0.72413793 0.89655172 0.75862069] mean value: 0.8495073891625615 key: train_recall value: [0.97276265 0.92217899 0.93774319 0.9766537 0.92996109 0.96875 0.92578125 0.796875 0.953125 0.921875 ] mean value: 0.9305705860894942 key: test_roc_auc value: [0.79187192 0.87623153 0.89470443 0.77339901 0.82512315 0.77093596 0.84051724 0.79064039 0.87684729 0.77216749] mean value: 0.8212438423645321 key: train_roc_auc value: [0.94927195 0.96108949 0.95519972 0.92777997 0.96498054 0.93962792 0.91425249 0.8984375 0.97461697 0.94731882] mean value: 0.9432575389105058 key: test_jcc value: [0.68421053 0.76666667 0.80645161 0.64864865 0.70588235 0.64864865 0.75 0.63636364 0.78787879 0.62857143] mean value: 0.7063322308938008 key: train_jcc value: [0.9057971 0.92217899 0.91287879 0.87152778 0.92996109 0.88888889 0.84341637 0.796875 0.94941634 0.8973384 ] mean value: 0.891827874937678 MCC on Blind test: 0.16 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01642084 0.016325 0.02404594 0.03962636 0.03637481 0.03999496 0.04035878 0.02521992 0.02431512 0.03965163] mean value: 0.030233335494995118 key: score_time value: [0.01417303 0.01231766 0.0191412 0.01912379 0.01904559 0.0190649 0.01907992 0.01245284 0.01895809 0.01929522] mean value: 0.01726522445678711 key: test_mcc value: [0.8951918 0.89988258 0.96551724 0.96547546 0.92980296 0.96551724 0.8951918 0.85960591 0.96547546 0.8615634 ] mean value: 0.9203223848137981 key: train_mcc value: [0.96127477 0.97672617 0.9611292 0.96892768 0.96509421 0.96127828 0.94967392 0.95770742 0.95718129 0.95718129] mean value: 0.9616174236606619 key: test_accuracy value: [0.94736842 0.94736842 0.98245614 0.98245614 0.96491228 0.98245614 0.94736842 0.92982456 0.98245614 0.92982456] mean value: 0.9596491228070175 key: train_accuracy value: [0.98050682 0.98830409 0.98050682 0.98440546 0.98245614 0.98050682 0.97465887 0.9785575 0.9785575 0.9785575 ] mean value: 0.9807017543859649 key: test_fscore value: [0.94545455 0.94915254 0.98245614 0.98181818 0.96428571 0.98245614 0.94915254 0.93103448 0.98305085 0.93333333] mean value: 0.960219447055554 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:188: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_sl.py:191: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.98076923 0.98841699 0.98069498 0.98455598 0.98265896 0.98069498 0.97495183 0.97888676 0.97864078 0.97864078] mean value: 0.980891126474896 key: test_precision value: [0.96296296 0.90322581 0.96551724 1. 0.96428571 1. 0.93333333 0.93103448 0.96666667 0.90322581] mean value: 0.9530252014289834 key: train_precision value: [0.96958175 0.98084291 0.97318008 0.97701149 0.97328244 0.96946565 0.96197719 0.96226415 0.97297297 0.97297297] mean value: 0.9713551606612233 key: test_recall value: [0.92857143 1. 1. 0.96428571 0.96428571 0.96551724 0.96551724 0.93103448 1. 0.96551724] mean value: 0.9684729064039409 key: train_recall value: [0.9922179 0.99610895 0.98832685 0.9922179 0.9922179 0.9921875 0.98828125 0.99609375 0.984375 0.984375 ] mean value: 0.9906401994163424 key: test_roc_auc value: [0.94704433 0.94827586 0.98275862 0.98214286 0.96490148 0.98275862 0.94704433 0.92980296 0.98214286 0.92918719] mean value: 0.9596059113300494 key: train_roc_auc value: [0.98048395 0.98828885 0.98049155 0.9843902 0.98243707 0.98052955 0.97468537 0.97859162 0.97856882 0.97856882] mean value: 0.9807035809824902 key: test_jcc value: [0.89655172 0.90322581 0.96551724 0.96428571 0.93103448 0.96551724 0.90322581 0.87096774 0.96666667 0.875 ] mean value: 0.9241992425446263 key: train_jcc value: [0.96226415 0.97709924 0.96212121 0.96958175 0.96590909 0.96212121 0.95112782 0.95864662 0.9581749 0.9581749 ] mean value: 0.9625220897761719 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.15240073 0.28842759 0.29213166 0.31818414 0.29039454 0.24698877 0.31169581 0.2405808 0.31258917 0.31262183] mean value: 0.276601505279541 key: score_time value: [0.01224947 0.0193646 0.02494502 0.01928473 0.02496862 0.02136517 0.01929235 0.01257014 0.02258611 0.01934028] mean value: 0.01959664821624756 key: test_mcc value: [0.8951918 0.8953202 0.96551724 0.96547546 0.92980296 0.96551724 0.8951918 0.85960591 0.96547546 0.8615634 ] mean value: 0.9198661467253358 key: train_mcc value: [0.96127477 0.98051435 0.9611292 0.96892768 0.96509421 0.96127828 0.94967392 0.95770742 0.95718129 0.95718129] mean value: 0.961996241747861 key: test_accuracy value: [0.94736842 0.94736842 0.98245614 0.98245614 0.96491228 0.98245614 0.94736842 0.92982456 0.98245614 0.92982456] mean value: 0.9596491228070175 key: train_accuracy value: [0.98050682 0.99025341 0.98050682 0.98440546 0.98245614 0.98050682 0.97465887 0.9785575 0.9785575 0.9785575 ] mean value: 0.980896686159844 key: test_fscore value: [0.94545455 0.94736842 0.98245614 0.98181818 0.96428571 0.98245614 0.94915254 0.93103448 0.98305085 0.93333333] mean value: 0.960041034923529 key: train_fscore value: [0.98076923 0.99025341 0.98069498 0.98455598 0.98265896 0.98069498 0.97495183 0.97888676 0.97864078 0.97864078] mean value: 0.9810747687638014 key: test_precision value: [0.96296296 0.93103448 0.96551724 1. 0.96428571 1. 0.93333333 0.93103448 0.96666667 0.90322581] mean value: 0.9558060690596841 key: train_precision value: [0.96958175 0.9921875 0.97318008 0.97701149 0.97328244 0.96946565 0.96197719 0.96226415 0.97297297 0.97297297] mean value: 0.9724896194734839 key: test_recall value: [0.92857143 0.96428571 1. 0.96428571 0.96428571 0.96551724 0.96551724 0.93103448 1. 0.96551724] mean value: 0.9649014778325123 key: train_recall value: [0.9922179 0.98832685 0.98832685 0.9922179 0.9922179 0.9921875 0.98828125 0.99609375 0.984375 0.984375 ] mean value: 0.9898619892996109 key: test_roc_auc value: [0.94704433 0.9476601 0.98275862 0.98214286 0.96490148 0.98275862 0.94704433 0.92980296 0.98214286 0.92918719] mean value: 0.9595443349753695 key: train_roc_auc value: [0.98048395 0.99025717 0.98049155 0.9843902 0.98243707 0.98052955 0.97468537 0.97859162 0.97856882 0.97856882] mean value: 0.9809004134241245 key: test_jcc value: [0.89655172 0.9 0.96551724 0.96428571 0.93103448 0.96551724 0.90322581 0.87096774 0.96666667 0.875 ] mean value: 0.923876661899465 key: train_jcc value: [0.96226415 0.98069498 0.96212121 0.96958175 0.96590909 0.96212121 0.95112782 0.95864662 0.9581749 0.9581749 ] mean value: 0.9628816641815479 MCC on Blind test: 0.82 Accuracy on Blind test: 0.92